Publication : USDA ARS

ARS Home » Southeast Area » Stoneville, Mississippi » Genomics and Bioinformatics Research » Research » Publications at this Location » Publication #329312

Title: Subgenome-anchored physical frameworks of the allotetraploid Upland cotton (Gossypium hirsutum L.) genome, and an approach toward reference-grade assemblies of polyploids

Author

	SASKI, CHRISTOPHER - Translational Genomics Research Institute
	Scheffler, Brian
	HULSE-KEMP, AMANDA - Texas A&M University
	LIU, BO - Texas A&M University
	SONG, QUINGXIN - University Of Texas
	STELLY, DAVID - University Of Texas
	Scheffler, Jodi
	JONES, DON - Cotton, Inc
	PETERSON, DANIEL - Mississippi State University
	HAIGLER, CANDACE - North Carolina State University
	SCHMUTZ, JEREMY - Hudsonalpha Institute For Biotechnology
	CHEN, Z - University Of Texas

Submitted to: Scientific Reports
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/18/2017
Publication Date: 11/10/2017
Publication URL: http://handle.nal.usda.gov/10113/5934974
Citation: Saski, C.A., Scheffler, B.E., Hulse-Kemp, A.M., Liu, B., Song, Q., Stelly, D.M., Scheffler, J.A., Jones, D.C., Peterson, D.G., Haigler, C., Schmutz, J., Chen, Z.J. 2017. Subgenome-anchored physical frameworks of the allotetraploid Upland cotton (Gossypium hirsutum L.) genome, and an approach toward reference-grade assemblies of polyploids. Scientific Reports. 7:15274. https://doi:10.1038/s41598-017-14885-w.
DOI: https://doi.org/10.1038/s41598-017-14885-w

Interpretive Summary: Interpretative Summary: The generation of a high quality genome (complete DNA sequence in the proper order) of a given species is a difficult task. This task is much harder when the species is a polyploidy, like cotton. Cultivated cotton is a tetraploid with it genome comprised of the contributions from two different related cotton species (A and D genomes). These two genomes are not identical but do have many similarities that make it difficult to determine if a short DNA sequence belongs to the A or D genome. A common and perhaps the cheapest way to assemble a genome is to fractionate the total DNA into very small pieces (~300 base pairs) and then the fragments are sequenced. Using various methods the genome is sequenced ~100 times and the various pieces are then put together to reconstitute the genome. While cheap and fast, this method does not produce a high quality genome. An alternative method is to fractionate the genome into much larger pieces (~150,000 bp) and clone these large fragments into bacterial artificial chromosomes (BAC). Each of these larger BACs can be sequenced as previously indicated but now the target is to assemble 150,000 bp at a time instead of 2,400,000,000 bp (approximate size of the cotton genome) all at one time. However this approach is costly and requires significant resources as each region of the genome is represented multiple time in the BAC library. A streamlined method, called BAC fingerprinting, can determine the general position of each BAC in the cotton genome and calculate which BACs are needed to cover the whole genome with the least number of BACs, called a minimal tiling path (MTP). The BACs belonging to the MTP can then be sequenced and then assembled to generate a quality genome. In this publication, BAC libraries for cultivated cotton was generated, fingerprinted and a MTP with chromosomal locations was derived and confirmed by BAC end sequencing and technique called FISH that allows a physical determination of a piece of DNA on a chromosome. In addition, two strategies were tested to determine the best way to sequence these BAC clones, using small or large pools. Each method was tested on the MYP representing pairs of chromosomes with one from the A and D genomes. In this instance the pairs 12/26 (large BAC pool) and 11/21 (small BAC pool) were used. The large pool approach was significantly cheaper but yielded lower quality results. This information, along with the MTP BACs should allow for the production of a high quality reference genome of cultivated cotton.

Technical Abstract: Like many agricultural crops, the cultivated cotton genome is large and polyploid (~2.5Gb), consisting of two very similar repeat-rich subgenomes, whose size and complexity pose significant challenges for accurate genome reconstruction using whole-genome shotgun approaches. A strategy for accurately partitioning multiple subgenomes of polyploids for contemporary multiplex sequencing can facilitate reference-grade genome quality. A reference-grade genome assembly is the foundation for positional cloning of genes and the acceleration of beneficial traits in Upland cotton. We describe the development of high-quality BAC libraries, subgenome specific physical maps, and the development of a new age sequencing approach that will lead to a reference-grade quality genome assembly for Upland cotton (AD1). Three BAC libraries were constructed, fingerprinted, and integrated with BAC-end sequences (BES) to produce a de novo whole-genome physical map. The BAC map was partitioned by subgenome through alignment to the D-genome extant relative reference sequence with densely spaced BAC-end sequence anchor points (~179k). The resulting physical maps comprise 58,485 BACs that assemble as 5,298 contigs and 12,471 singletons in the A subgenome, and 33,906 BACs with 1,998 contigs and 884 singletons in the D subgenome. The physical map was validated with FISH hybridization and SNP linkage markers derived from BES. Two pairs of homoeologous chromosomes, A11/D21 and A12/D26, were used to assess multiple sequencing approaches for contiguity and scalability. We report the first subgenome anchored physical maps of Upland cotton, and a new-age approach to whole genome sequencing that will lead to the first reference-grade assembly of an allopolyploid crop.

U.S. DEPARTMENT OF AGRICULTURE

Genomics and Bioinformatics Research: Stoneville, MS