Abstract: Tea is the oldest and most popular nonalcoholic beverage consumed in the world. It provides abundant secondary metabolites that account for its diverse flavors and health benefits. Here we present the first high-quality chromosome-length reference genome of C. sinensis var. sinensis using long read single-molecule real time (SMRT) sequencing and Hi-C technologies to anchor the ∼2.85-Gb genome assembly into 15 pseudo-chromosomes with a scaffold N50 length of ∼195.68 Mb. We annotated at least 2.17 Gb (∼74.13%) of repetitive sequences and high-confidence prediction of 40,812 protein-coding genes in the ∼2.92-Gb genome assembly. This accurately assembled genome allows us to comprehensively annotate functionally important gene families such as those involved in the biosynthesis of catechins, theanine and caffeine. The contiguous genome assembly provides the first view of the repetitive landscape allowing us to accurately characterize retrotransposon diversity. The large tea tree genome is dominated by a handful of Ty3-gypsy long terminal repeat (LTR) retrotransposon families that recently expanded to high copy numbers. We uncover the latest bursts of numerous non-autonomous LTR retrotransposons that may interfere with the propagation of autonomous retroelements. This reference genome sequence will largely facilitate the improvement of agronomically important traits relevant to the tea quality and production.

Authors: Qun-Jie Zhang, Wei Li, Kui Li, Hong Nan, Cong Shi, Yun Zhang,  Zhang-Yan Dai, Yang-Lei Lin, Xiao-Lan Yang, Yan Tong, Dan Zhang, Cui Lu,  Chen-feng Wang, Xiao-xin Liu, Wen-Kai Jiang, Xing-Hua Wang, Xing-Cai Zhang, Zhong-Hua Liu, Evan E. Eichler, Li-Zhi Gao

以下来自科学网 http://news.sciencenet.cn/sbhtmlnews/2020/1/352632.shtm


本报讯 近日,在线发表了中美科学家联合研究团队的最新成果,该团队采用单分子实时测序(SMRT)和 Hi-C技术,在20多个代表性小叶茶品种中选用杂合度较低的小叶茶良种“碧云”,将组装获得的约2.85Gb 的基因组序列挂载到了15 条假染色体上,在国际上首次获得了达到染色体级别的茶树中国茶变种的参考基因组序列。

论文通讯作者高立志告诉《中国科学报》,通过与基于基因组二代测序技术发表的小叶茶舒茶早基因组草图比较基因组学分析表明,该研究获得的基因组图谱scaffold N50长度高达195.68 Mb,在组装准确性与完整性上都得到了极大的提升;解析获得了包括茶多酚、茶氨酸和咖啡因生物合成相关基因在内的40812个蛋白编码基因的准确序列,首次及时地为全世界茶学研究者提供了准确的茶树基因信息。



相关论文信息:https://doi.org/10.1101/2020.01.02.892430《中国科学报》 (2020-01-07 第3版 农业科技)

