Location: Plant, Soil and Nutrition ResearchTitle: New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica) Author
Submitted to: Genome Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/20/2014
Publication Date: 12/3/2014
Publication URL: DOI: 10.1186/s13059-014-0506-z
Citation: Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez, W.A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., Wright, M.H., Chia, J., Ware, D., Mccouch, S.R., Mccombia, W.R. 2014. New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica. Genome Biology. 15:506-521. Interpretive Summary: This manuscript reports inter-comparison of genomes that were sequenced and assembled from three strains of cultivated Asian rice, representing the diverse sub-populations of japonica, indica, and aus. The research demonstrated feasibility in using next generation sequencing technology to assemble high-quality reference assemblies, which were amenable to detailed annotation of protein-coding genes. Comparison of the three genomes revealed core conserved genes as well as genes unique to individual strains. Detailed analysis of several loci associated with agriculturally important traits illustrated the utility of this approach for biological discovery.
Technical Abstract: The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the pan-genome of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.