Skip to main content
ARS Home » Southeast Area » Stoneville, Mississippi » Genomics and Bioinformatics Research » Research » Publications at this Location » Publication #389518

Research Project: Applied Agricultural Genomics and Bioinformatics Research

Location: Genomics and Bioinformatics Research

Title: Leveraging national germplasm collections to determine significantly associated categorical traits in crops: Upland and Pima Cotton as a case study

item RESTREPO-MONTOYA, DANIEL - North Carolina State University
item Hulse-Kemp, Amanda
item Scheffler, Jodi
item HAIGLER, CANDACE - North Carolina State University
item Hinze, Lori
item Love, Janna
item Percy, Richard
item JONES, DON - Cotton, Inc
item Frelichowski, James - Jim

Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/21/2022
Publication Date: 4/26/2022
Citation: Restrepo-Montoya, D., Hulse-Kemp, A.M., Scheffler, J.A., Haigler, C., Hinze, L.L., Love, J., Percy, R.G., Jones, D.C., Frelichowski, J.E. 2022. Leveraging national germplasm collections to determine significantly associated categorical traits in crops: Upland and Pima Cotton as a case study. Frontiers in Plant Science. 13:837038.

Interpretive Summary: Worldwide germplasm collections are essential to collecting and conserving living plant material, solving agricultural production problems, as well as conserving plant genetic diversity for future needs. These collections have amassed hundreds of thousands of individuals or accessions of different species. The United States Germplasm Resources Information Network (GRIN) alone has more than 500,000 accessions representing more than 10,000 species. Cotton is one of these species and is one of the most important cash crops around the world, it provides the largest renewable source of fiber in addition to edible oil and protein. To better characterize the diversity of cotton accessions, the United States National Cotton Germplasm Collection began measuring 36 traits in a standardized manner over the past decade. A single trait has multiple states which represent the multiple types found for that trait among cotton accessions, so collecting this trait information ultimately helping to document the diversity that encompasses all of the cotton species in the collection. We leveraged a curated subset of collected trait data to compare descriptors within three major cotton groups maintained in the collection, including Pima (or Egyptian) cotton, cultivated Upland cotton, and its less-improved Upland relatives or landraces. We demonstrate that many traits have statistically significant associations, or relationships between trait states. These relationships vary across the three groups and plant breeding has broken down these trait relationships over time. This trait data can generally be used alone to determine which group a cotton accession belongs. These trait relationships will help plant breeders and others interested in cotton to better use the materials available in the germplasm collection. We anticipate that our analysis for cotton will be adaptable to the germplasm collections of other crops to help us learn more with treasure troves of trait data collected on our germplasm.

Technical Abstract: Observable qualitative traits are relatively stable across environments and are commonly used to evaluate crop genetic diversity. Recently, molecular markers have largely superseded describing phenotypes in diversity surveys. However, large amounts of qualitative descriptors have historically and are currently being collected as they are useful in cataloging germplasm collections and for describing new germplasm in patents, publications, and/or the Plant Variety Protection (PVP) system. This research focused on the comparative analysis of standardized cotton traits as represented within the National Cotton Germplasm Collection (NCGC). The cotton traits are named by ‘descriptors’ that have non-numerical sub-categories (descriptor states) reflecting the details of how each trait manifests or is absent in the plant. We statistically assessed selected accessions from three major groups of Gossypium as defined by the NCGC curator: 1) containing mainly Upland cotton (Gossypium hirsutum) cultivars; 2) containing mainly G. hirsutum landraces or minimally improved materials; and 3) containing cultivars or landraces of Pima cotton (Gossypium barbadense). For 33 cotton descriptors we: (a) revealed distributions of character states for each descriptor within each group; (b) analyzed bivariate associations between paired descriptors; and (c) clustered accessions based on descriptors. The fewest significant associations between descriptors occurred in the SA dataset, likely reflecting extensive breeding for cultivar development. In contrast, the TEX and Gb datasets showed a higher number of significant associations between descriptors, likely correlating with less impact from breeding efforts. Three significant bivariate associations were identified for all three groups, bract nectaries:boll nectaries, leaf hair:stem hair, and lint color:seed fuzz color. Label-blind clustering analysis recapitulated the species labels for about 97% of the accessions. Unexpected clustering results indicated accessions that may benefit from potential further investigation in the collection. In the future, the newly established significant associations between standardized descriptors can be used by curators to determine whether new exotic/unusual accessions most closely resemble Upland or Pima cotton. The study shows how existing descriptors for large germplasm datasets can be distilled into a more useful form to inform downstream goals in breeding and research, demonstrating the utility of the analytical methods employed in categorizing germplasm diversity within the collection.