Location: Genomics and Bioinformatics Research2013 Annual Report
1a. Objectives (from AD-416):
The genome sequence of Gossypium (G.) raimondii (D-genome cotton) is scheduled to be released in late 2011 while the G. arboreum (A-genome cotton) genome is slated for release in 2012. These two diploid species are believed to be the progenitors of the tetraploid commercial cotton species G. barbadense and G. hirsutum (both AADD). G. arboreum is a cultivated species in some parts of the world, but it does not have the superior characteristics of the two cultivated tetraploid species. While G. raimondii does not produce lint, many of the traits associated with fiber productivity and quality in the tetraploid commercial species appear to have been largely accounted for by G. raimondii genes. The availability of two reference genomes means it is now possible to start major whole genome comparisons between all Gossypium species. In this regard, we will use genome resequencing to explore the genetic diversity of the genus Gossypium.
1b. Approach (from AD-416):
We will use short-read sequencing technology to produce 20X-50X genome coverage of approximately 26 different cotton species/cultivars. The sequence data will then be mapped back to the reference genomes to identify differences between and within species. Approximately eight diploid species representing the D, C, G, K, A, F, E, and B genomes and five AD tetraploid species will be resequenced. Within cultivated tetraploid cotton, 4 and 19 accessions will be resequenced for G. barbadense and G. hirsutum, respectively. In this way it is believed relevant single nucleotide polymorphisms (SNPs) will be uncovered that can be used in genetic mapping and marker assisted selection. Allelic and gene variations will also be examined for future exploration to improve cotton fiber quality and yield.
3. Progress Report:
Diploid Gossypium species have traditionally been placed into eight groups based upon their relative phylogenetic relatedness. These groups are represented by the letters A, B, C, D, E, F, G, and K. The commercial cotton species Gossypium (G.) hirsutum and G. barbadense are tetraploids with an A genome derived from a species similar to the modern diploid G. herbaceum and a D genome similar to that of the diploid G. raimondii. The long term goal has been generation of a reference genome sequence for G. raimondii and comparison of the reference sequence with draft genome sequences from a wide range of diploid and tetraploid Gossypium species/lines. To date, the project has produced 1.55 trillion base pairs of DNA sequence data, and ARS scientists anticipate generation of an additional 1 trillion base pairs in the coming year. The sequence data is being utilized to inform cotton molecular breeding, explore Gossypium genome evolution, and identify the genes underlying the economic and adaptive traits of cotton. FY 2013 highlights include the following: (1) Data generated in this project was used in assembly and utilization of the first Gossypium reference genome sequence; specifically, a reference sequence was constructed for G. raimondii, the D-genome progenitor of tetraploid (AD) cottons. The G. raimondii reference genome was compared with G. longicalyx (F), G. herbaceum (A), and G. hirsutum (AtDt where t indicates tetraploid) DNA to explore evolution of the clade. In its divergence from other dicots, the progenitor of all eudicots underwent three detectable rounds of polyploidy making it a paleohexaploid. ARS scientists showed that the ancestral Gossypium species experienced 2.5 to 3 additional polyploidy events after its divergence from cacao; thus the so-called diploid Gossypium species have 15-18 copies of the ancestral dicot genome in their somatic cells. Union of the A and D genomes in AtDt allotetraploids doubled the number of ancestral dicot genomes to an astounding 30-36 copies. The results of this initial reference sequence analysis and comparative genomics study were published in the journal NATURE. (2) A comparison of G. hirsutum (AtDt) loci with corresponding loci in the progenitor diploid G. herbaceum (A) and G. raimondii (D) genomes revealed that soon after formation of the AtDt tetraploid, unidirectional DNA exchanges between homeologous chromosomes were the predominant mutational type, far outnumbering random mutations. At to Dt conversion, creating four copies of the Dt allele, is far more abundant than Dt to At conversion. Additionally, At to Dt conversions are more common in heterochromatin and closely associated with GC content and transposon distribution. Dt to At conversion is abundant in euchromatin and in genes, frequently reversing losses of gene function. Eventually, unidirectional exchanges between homeologs subsided, and random mutation became the predominant mutation type. A manuscript describing these findings has been submitted to GENOME RESEARCH. (3) ARS scientists are producing draft assemblies of various diploid cotton species using both ab initio and reference guided assembly approaches. The reference guided approach, in which sequences are assembled based on comparison with the G. raimondii genome sequence, will permit high resolution comparisons between genomes. Alignment of ab initio assemblies with reference guided assemblies should permit detection of chromosomal rearrangements between species; such rearrangements would not necessarily be discovered by examination of reference guided assemblies alone. (4) Alignment of the various Gossypium genome sequences with the G. raimondii reference sequence has allowed identification of single-nucleotide polymorphisms (SNPs). SNPs are useful molecular markers permitting high resolution genetic mapping. (5) The repetitive DNA contents of the various Gossypium genomes are being compared as a means of exploring mechanisms underlying Gossypium genome divergence. With regard to the AtDt tetraploids, comparison of repeat sequence contents is providing insight into the effects of allopolyploidy and domestication on repeat contents. (6) We have continued work on the upland cotton pest, Rotylenchulus (R.) reniformis (reniform nematode). The R. renifomis work has involved close collaboration between this ARS group and the USDA-ARS Precision Agriculture Unit at Mississippi State University. For instance, we have constructed a high quality transcriptome for reniform nematode and obtained gene expression data from the various life stages of the worm. A manuscript describing the transcriptome is near completion, while a manuscript on differential gene expression is in progress. (7) We conducted a small proteomics study in which R. reniformis peptides were queried against protein databases for the model nematode Caenorhabditis elegans, the root knot nematode (Meloidogyne hapla), and cDNA/gene assemblies for R. reniformis. This has allowed us to identify candidate parasitism genes. A manuscript describing this work is in progress. (8) We used flow cytometry to estimate the genome size of R. reniformis and found its genome size to be considerably larger than previous estimates. M. hapla and C. elegans nuclei were used among the control species utilized. A manuscript re-evaluating genome size estimates in parasitic nematodes is in progress. (9) An earlier attempt at producing a genome sequence for R. reniformis yielded unsatisfactory results. Of note, the DNA used in sequencing was obtained from a pool of thousands of individual worms. Heterogeneity in R. renformis is extensive, and thus assembly of a quality genome from high coverage sequencing proved disappointing. However, ARS scientists subsequently isolated DNA from a single R. reniformis egg. The DNA was amplified and relatively low depth sequencing was conducted. Not surprisingly, the assemblies from the single egg were orders of magnitude better than the assemblies from the heterogenous population of R. reniformis. Additional sequencing of amplified DNA from the single egg is currently underway.