Skip to main content
ARS Home » Plains Area » College Station, Texas » Southern Plains Agricultural Research Center » Crop Germplasm Research » Research » Research Project #424746

Research Project: Developing Genomic and Genetic Tools for Exploiting Cotton Genetic Variation

Location: Crop Germplasm Research

2018 Annual Report

The goal of this project is to develop genomic and genetic tools, materials, and information critically lacking for effectively exploiting cotton genetic variation in Gossypium germplasm characterization and cotton genetic improvement programs. Objective 1 reflects our commitment to continue the development of portable DNA markers (simple sequence repeat or SSR and single nucleotide polymorphism or SNP) and molecular descriptors (core sets of well-defined DNA markers) and make them available to the cotton research community. Objective 2 reflects our unique participation in the development of Upland cotton genome sequence and community database resources. A complete reference genome sequence of the Upland cotton genetic standard (G. hirsutum acc. TM-1) will unprecedentedly facilitate the process of gene mining in Gossypium germplasm for commercial improvement. A centralized public database (CottonGen) with user-friendly bioinformatic tools will make coordinated analysis and dissemination of research data and information more effective for the cotton research community. Work under Objective 3 will identify genes or novel alleles, genomic regions or quantitative trait loci (QTLs) for value-added priority traits, utilizing the genomic and genetic tools developed under the first two objectives. Superior cotton lines will be identified for developing breeding populations with novel variability for traits of interest. Collaborative work with key members/organizations of the cotton research community is necessary and will be done; all such work will be of mutual benefit and will be conducted so as to assure complementarity and lack of duplication. Specifically, during the next five years the project will focus on the following three objectives. Objective 1: Develop new genetic markers to augment current core sets of mapped SSR and SNP markers for high-throughput characterization of the genetic diversity within and among Gossypium germplasm accessions in the National Cotton Germplasm Collection. Sub-obj. 1A: Develop new SSR and SNP primers, and evaluate for polymorphism. Sub-obj. 1B: Identify and validate core cotton SSR and SNP markers. Objective 2: Collaborate with other public national and international researchers to sequence and analyze the tetraploid genome of G. hirsutum genetic standard genotype Texas Marker-1 (TM-1), and coordinate the activities of a public database to maintain and disseminate sequence and other genetic information to the research community. Sub-obj. 2A: Develop and analyze TM-1 genome sequence. Sub-obj. 2B: Coordinate the activities of CottonGen. Objective 3: Identify key genes and genomic regions of cotton that govern or are closely linked with priority traits, including fiber yield and quality, as well as biotic and abiotic stress tolerance. Sub-obj. 3A: Apply cotton genomic tools to identify and characterize QTLs or alleles from cotton genetic resources, maintained under the sister project, that govern key agronomic or fiber traits. Sub-obj. 3B: Apply the preceding information to identify superior parents for developing breeding populations with novel sources of variability for traits of interest.

New genetic markers will be created to augment current core sets of mapped SSR and SNP markers developed for high-throughput characterization of the genetic diversity within the National Cotton Germplasm Collection (objective 1). Genomic DNA will be isolated from species of G. hirsutum, G. barbadense, G. arboreum, and G. raimondii; and cotton sequence reads will be generated employing next–generation sequencing (NGS) technologies (Illumina GAIIx or HiSeq system). A high-throughput simplified one-enzyme system will be used to simultaneously discover and genotype SNP loci. The information will be used to develop A and D genome-specific SNP markers that will be made available to members of the research community via CottonGen (Sub-objective 1A). New polymorphic SSR and SNP markers will be mapped to the 26 chromosomes of the tetraploid cotton genome (sub-objective 1B). Genetic mapping of markers (SSR and SNP) will be conducted using the 186 RILs of the publicly available mapping population TM-1 x 3-79 RIL. The G. hirsutum genome will be sequenced in collaboration with national and international researchers (sub-objective 2A). Working closely with BGI and Cotton Research Institute (CRI), an integrated assembly strategy that includes large insert BAC libraries, sub-genome alignments, and chromosomal anchoring through the restriction site associated DNA (RAD)-seq analysis are being developed and tested. The TM-1 reference genome sequence will be made available to the broader plant research community via GenBank and CottonGen. The activities of the database CottonGen will be supported and coordinated through a cooperative agreement with Cotton Incorporated (sub-objective 2B). CottonGen is being built using the open-source Tripal database infrastructure to incorporate new datasets such as annotated transcriptome, genome sequence, marker-trait-locus and breeding data, as well as enhanced tools for easy querying and visualizing research data. Most technical aspects of building and maintaining CottonGen are handled by the database team at Washington State. The ARS group at College Station is responsible for determining what functionality, content, and data integration is desired of the database. Genomic tools will be used to identify QTLs or alleles governing key agronomic or fiber traits (sub-objective 3A). Validation of the growing numbers of QTLs reported in cotton will be accomplished by aligning genomic locations and comparing genetic effects of QTLS in order to make the QTLs useful. Once it is determined that specific chromosomal regions contain genes that make a significant contribution to the expression of a trait, fine-mapping of the most promising genomic regions will be used to identify polymorphisms in coding and/or regulatory regions. Diagnostic DNA markers that are associated with traits of interest will be used to screen the Collection, and lines possessing combinations of desirable QTLs will be used for developing breeding populations with novel sources of variability (sub-objective 3B).

Progress Report
During FY 2018, significant progress was made in precisely resequencing cotton genomes. A high-quality assembly of G. arboreum chromosome 12 was generated by genetic mapping and reference-assisted assembly approaches. Major mis-assemblies were discovered in previously assembled chromosome 12 of G. arboreum, particularly in anchoring and orienting of scaffolds into a pseudo-molecule. Evaluation of the results that correspond to homologous chromosomes of G. raimondii and G. hirsutum further confirmed the significantly improved quality of the re-assembly (Objective 2). The project continued support in FY 2018 of the CottonGen database (managed by Cotton Incorporated) which serves the broad cotton community worldwide; approximately 3,000 new genetic tools known as molecular markers developed in cotton and about 900 new genetic elements known as quantitative trait loci (QTL) identified from cotton were added to CottonGen (Objective 2). A new Breeding Information Management System (BIMS), developed by the CottonGen team which includes project scientists, was made available online (Objective 2). During FY 2018, CottonGen was accessed more than 117,000 times by thousands of cotton researchers, breeders, etc., from more than 130 nations (Objective 2). Project leadership and support of CottonGen, and project contributions to the database, are critical to the ongoing success of this resource which is highly valued by the cotton research and breeding communities (Objective 2). In FY 2018, project scientists continued to make progress in the areas of genomic exploration of cultivated diploid cottons for specific traits and defining genetic control of fiber development as well as cottonseed oil content in tetraploid cottons (Objective 3). Over the life of this project, work in collaboration with national and international cooperators made major advances in cotton genomics and bioinformatics, creating significant amounts of genomic resources and new knowledge/information that was disseminated to the plant research community. The Upland cotton germplasm known as G. hirsutum TM-1 is accepted by the worldwide cotton research/breeding community as a standard for intensive study which yields findings of direct relevance to cotton improvement. Following the publication of TM-1’s two sub-genome contributors, wild G. raimondii and cultivated G. arboreum diploid genomes, the TM-1 tetraploid genome was successfully sequenced, assembled, and annotated (Objective 2). This work revolutionized the genetic improvement of cotton crop and makes cotton breeding more effective and efficient. About 49,000 informative single nucleotide polymorphism (SNP) markers were developed for mapping applications including gene discovery and genetic diversity studies (Objective 1); new, improved versions of CottonGen were released (Objective 2). About 44,000 expressed sequence tags (ESTs), 400 quantitative trait loci (QTLs), and many key genes for cotton priority traits including fiber and seed quality, disease resistance, photoperiod response, and agronomic characters were mapped and/or cloned for cotton improvement. This project expired in FY 2018 and was replaced by 3091-21000-044-00D which is continuing and expanding upon the work.

1. Refined genome sequence of G. arboreum chromosome 12. Although next-generation sequencing (NGS) technologies significantly improve the efficiency of genome sequencing, major challenges remain in accurate assembling of such complex plant genomes as cotton. ARS scientists at College Station, Texas, working in collaboration with the Chinese Academy of Agricultural Sciences, discovered a number of mis-assemblies in the previously sequenced genome of G. arboreum, a cultivated diploid cotton. These mis-assemblies have complicated the successful application of genomic information in cotton improvement. To solve this problem, a high-quality assembly of G. arboreum chromosome 12 was developed by genetic mapping and reference-assisted assembly approaches. The newly completed re-assembly corresponds precisely to homologous chromosomes of G. raimondii and G. hirsutum genomes, further confirming its significantly improved quality for cotton improvement. This work validates the current assembling strategy that identifies and corrects mis-assemblies in draft genome sequences, and provides additional insights into cotton genome structures and genetic applications. The work will significantly enhance cotton genomics research and the development of improved cottons for U.S. farmers.

2. Release of Breeding Information Management System. The cotton breeding and research community has lacked a secure, comprehensive, online accessible breeding data management and analysis system. ARS scientists at College Station, Texas, working in collaboration with Cotton Incorporated and Washington State University, developed a Breeding Information Management System (BIMS) in the CottonGen database ( The newly developed Tripal module allows cotton breeders to create and manage access to their private breeding programs, upload phenotype data from the Field Book App or Excel templates, generate input files for Field Book, archive their entire data in the BIMS, search and filter accessions/lines by name, trial, location, cross, parent, and traits, and perform basic statistical analysis. With the release of BIMS, cotton breeders are now able to manage their own breeding data and analyze them with CottonGen’s publicly available genomic, genetic, and breeding datasets much more effectively and efficiently, thus expediting the process of developing new and improved cotton types for profitable use by U.S. farmers.

Review Publications
Kushanov, F., Buriev, Z.T., Shermatov, S.E., Turaev, O.S., Norov, T., Pepper, A.E., Saha, S., Ulloa, M., Yu, J., Jenkins, J.N., Abdukarimov, A., Abdurakhmonov, I.Y. 2017. QTL mapping for flowering-time and photoperiod insensitivity of wild cotton Gossypium darwinii Watt. PLoS One.