Skip to main content
ARS Home » Plains Area » College Station, Texas » Southern Plains Agricultural Research Center » Crop Germplasm Research » Research » Publications at this Location » Publication #403018

Research Project: Cotton Genetic Resource Management and Genetic Improvement

Location: Crop Germplasm Research

Title: Comparative analysis of genome sequences of the two cultivated tetraploid cottons, Gossypium hirsutum (L. )and G. barbadense (L.)

item MENG, QINGYING - Huazhong Agricultural University
item GU, JIAQI - Huazhong Agricultural University
item XU, ZHONGPING - Huazhong Agricultural University
item ZHANG, JIE - Huazhong Agricultural University
item TANG, JIWEI - Huazhong Agricultural University
item WANG, ANZHOU - Huazhong Agricultural University
item WANG, PING - Huazhong Agricultural University
item LIU, ZHAOWEI - Huazhong Agricultural University
item RONG, YUXUAN - Huazhong Agricultural University
item XIE, PEIHAO - Huazhong Agricultural University
item HUI, LIUYANG - Huazhong Agricultural University
item Udall, Joshua - Josh
item GROVER, CORRINE - Iowa State University
item Wendell, Jonathan

Submitted to: Industrial Crops and Products
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/14/2023
Publication Date: 3/1/2023
Citation: Meng, Q., Gu, J., Xu, Z., Zhang, J., Tang, J., Wang, A., Wang, P., Liu, Z., Rong, Y., Xie, P., Hui, L., Udall, J.A., Grover, C.E., Wendell, J.C. 2023. Comparative analysis of genome sequences of the two cultivated tetraploid cottons, Gossypium hirsutum (L.) and G. barbadense (L.). Industrial Crops and Products. Article e116471.

Interpretive Summary: Scientists have found ways to read and put together the genetic material of cotton plants. They have made genome sequences for many different kinds of cotton. But sometimes the genome sequences are not perfect, even for the same kind of cotton. This can make it hard to compare results between different genome sequences. This work determined which genome sequences were the best to study and compare. The work found some mistakes in the genome sequences, and also developed a tool that helps in comparing the different genome sequences. This work will help scientists learn more about cotton plants and how to make better cottons in the future for productive use by U.S. farmers.

Technical Abstract: With innovations in sequencing technology and the progress of high-performance computing systems, it is now relatively straightforward to sequence and assemble complex genomes. Many genomes from multiple cotton species have been released in recent years, with the highly homozygous standard genetic lines of two cultivated allotetraploid cottons, i.e., Gossypium hirsutum TM-1 and G. barbadense 3-79, assembled multiple times by different research groups using diverse sequencing technologies. The assembly quality among these genomes is variable, even between multiple accessions or versions of the same species, which can generate both confusion in choosing the appropriate genome for genetic analysis and obstacles when comparing results among the different reference genomes. Accordingly, an assessment of the many cotton genome sequences is necessary to facilitate both choice of genome sequence and comparisons between different versions or species. Here we comprehensively assess and compare genome assembly accuracy, completeness, and contiguity for nine G. hirsutum assemblies and four G. barbadense assemblies using multiple analysis strategies with the same criteria. We identify centromeric regions and several large-scale inversions among genomes from the same accession, indicating structural errors introduced during sequence ordering and orientation in G. hirsutum and G. barbadense genome assembly. Gene relationships between annotations from multiple genomes are defined within and across species, and the results are available at the Cotton Paralogs Groups Search website (, a convenient resource for converting gene ids and comparing annotations between different genome versions. This study comprehensively assesses and compares assembly quality among multiple versions of the two cultivated tetraploid cotton species with different assembly strategies, illustrating the challenges of sequencing and assembling complex genomes and providing a resource for cotton genomics.