Skip to main content
ARS Home » Plains Area » College Station, Texas » Southern Plains Agricultural Research Center » Crop Germplasm Research » Research » Publications at this Location » Publication #295114

Title: Genome sequence of the cultivated cotton Gossypium arboreum

Author
item LI, FUGUANG - Cotton Research Institute - China
item FAN, GUANGYI - Beijing Genome Institute
item WANG, KUNBO - Cotton Research Institute - China
item SUN, FENGMING - Beijing Genome Institute
item YUAN, YOULU - Cotton Research Institute - China
item SONG, GUOLI - Cotton Research Institute - China
item MA, ZHIYING - Agricultural University Of Hebei
item LI, QIN - Peking University
item LU, CAIRUI - Cotton Research Institute - China
item ZOU, CHANGSONG - Cotton Research Institute - China
item CHEN, WENBIN - Beijing Genome Institute
item LIANG, XINMING - Beijing Genome Institute
item SHANG, HAIHONG - Cotton Research Institute - China
item LIU, WEIQING - Beijing Genome Institute
item XIAO, GUANGHUI - Agricultural University Of Hebei
item GOU, CAIYUN - Beijing Genome Institute
item YE, WUWEI - Cotton Research Institute - China
item XU, XUN - Beijing Genome Institute
item ZHANG, XUEYAN - Cotton Research Institute - China
item WEI, HENGLING - Cotton Research Institute - China
item LI, ZHIFANG - Cotton Research Institute - China
item ZHANG, GUIYIN - Cotton Research Institute - China
item WANG, JUNYI - Beijing Genome Institute
item LIU, KUN - Cotton Research Institute - China
item Kohel, Russell
item Percy, Richard
item Yu, John
item ZHU, YU-XIAN - Peking University
item WANG, JUN - Beijing Genome Institute
item YU, SHUXUN - Cotton Research Institute - China

Submitted to: Nature Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/24/2014
Publication Date: 6/1/2014
Citation: Li, F., Fan, G., Wang, K., Sun, F., Yuan, Y., Song, G., Ma, Z., Li, Q., Lu, C., Zou, C., Chen, W., Liang, X., Shang, H., Liu, W., Xiao, G., Gou, C., Ye, W., Xu, X., Zhang, X., Wei, H., Li, Z., Zhang, G., Wang, J., Liu, K., Kohel, R.J., Percy, R.G., Yu, J., Zhu, Y., Wang, J., Yu, S. 2014. Genome sequence of the cultivated cotton Gossypium arboreum. Nature Genetics. 46(6):567-574.

Interpretive Summary: Upland cotton originated from the hybridization of two species and therefore has a very complex genetic makeup. Sequencing Upland cotton will greatly aid researchers in characterizing and exploiting cotton germplasm for agronomic traits, but its two-species origin greatly complicates sequencing efforts. As the first step toward sequencing Upland cotton, we had to sequence both of its putative parents. Here we report the complete sequencing and successful assembly of one of those parents. Over 90 percent of the assembled sequences, covering more than 98 percent of the parent species genome, were anchored and oriented to 13 chromosomes. A total of 41,330 genes were predicted with 92 percent being confirmed. The sequencing of the parent species A-genome, along with the previously published sequence of Upland cotton's other parent D-genome, lays the foundation for fully sequencing and assembling the more genetically complex commercial Upland cotton varieties. The parent species sequence provides the research community with critical resources and information for accelerated identification and enhancement of genetic systems contributing to cotton productivity, quality and environmental stability.

Technical Abstract: Cotton is one of the most economically important natural fiber crops in the world, and the complex tetraploid nature of its genome (AADD, 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled 98.3% of the 1.7-gigabase G. arboreum (AA, 2n = 26) genome, whose progenitor is a putative contributor of the diploid A-subgenome to tetraploid cottons. Pair-end sequencing from 10 libraries with insert sizes ranging from 180 bp to 40 kb resulted in 193.6 Gb clean sequence that covers the genome by 112.6-fold. Using a set of 24,569 single-nucleotide polymorphism (SNP) markers that we obtained from 154 F2 restriction-site-associated DNA (RAD) lines, we were able to anchor and orient 90.4% of the assembly on 13 pseudo chromosomes. The majority of the genome (68.5%) is occupied by repetitive DNA sequences, most of which are long terminal repeats (LTRs). We predicted 41,330 protein-coding genes in G. arboreum, which is similar to that of the G. raimondii. One ancient (about 115 - 146 million years ago, MYA) and one recent (approximately 13 - 20 MYA) whole genome duplications (WGDs) were shared by both species before the speciation event around 2 - 13 MYA. The two-fold size changes of these otherwise highly co-linear genomes were the result of LTR insertions in the past five million years. Expansion and contraction of nucleotide- binding site (NBS) gene family sizes in different cotton species may be responsible for their resistance to Verticillium dahlia. The ethylene-central regulatory pathway may determine fundamentally the fate of cotton fiber cell development.