Location: Genomics and Bioinformatics ResearchTitle: A Reference Genome for US Rice
|Stein, Joshua - Cold Spring Harbor Laboratory|
|Schmutz, J. - Hudsonalpha Institute For Biotechnology|
|Peterson, Daniel - Mississippi State University|
|Youngblood, Cal - Mississippi State University|
|Grimwood, J - Hudsonalpha Institute For Biotechnology|
Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 1/19/2018
Publication Date: 10/16/2018
Citation: Scheffler, B.E., Edwards, J., Stein, J., Ware, D., Vaughn, J.N., Schmutz, J., Peterson, D., Youngblood, C., Duke, M.V., Grimwood, J., Simpson, S.A., Mcclung, A.M. 2018. A Reference Genome for US Rice. Meeting Abstract. 37th Rice Technical Working Group.p.50.
Technical Abstract: The development of reference genomes for rice has served as means for understanding the allelic diversity and genetic structure of a cereal grain that feeds half of the world. It has long been understood that Oryza sativa diverged into two major sub-populations Indica and Japonica, over 400 K years ago. Reference genomes of these two sub-populations were published in 2005, including the de novo sequencing of Nipponbare, a temperate japonica (TEJ) rice cultivar, and the shot-gun sequencing of the indica (IND) cultivar 93-11. Germplasm in the USA is primarily derived from the tropical japonica (TRJ) sub-population which is much more limited in its worldwide cultivation than IND or TEJ. It is possible that the current reference genomes for IND, TEJ and, more recently, AUS, may lack genomic information relevant to TRJ germplasm and breeding efforts. The objective of this project was to develop a reference genome for TRJ germplasm that would be well suited for identification of genes important to USA breeding programs. A number of diversity panels containing global rice varieties have been resequenced and the genomic data has been made public. A merged data set of 4786 global rice accession and some 56 K SNP markers were used to identify a number of accessions that were representative of the TRJ genome. Two of these, Honduras and Carolina Gold, are landraces that served as founder lines in the USA historical rice pedigree. Carolina Gold (CGR) is credited with the establishment of the US rice industry because it was the predominant variety grown for over 200 years on the Southeast coast before USA rice production moved to where it is grown today. For these two reasons, CGR was selected for de novo sequencing and to serve as a reference genome for TRJ germplasm. To develop a reference genome for Carolina Gold, 96X genome coverage was generated using the RSII from Pacific BioSciences. The average read length used in the assembly was 11,417 bp in 3,284,634 reads giving a total 37,499,187,779 bp used in the final assembly. The present CGR assembly size is 386,298,647 bp in 208 contigs, giving an average contig length of 1,857,205 bp per contig, and an N50 (size of contig above which contains half of the total assembly) of 12,879,605 bp. Another statistic is NG50 (which is based on 400 Mb genome size not total assembly length) is 11,626,317 (meaning that 200 Mb of the genome is contained in contig sizes above NG50). Illumina reads are being used for error correction and 10X Genomics data is being used for checking for positional errors. To understand the genomic diversity of germplasm pertinent to USA rice breeding efforts, sequence data from over 150 accessions have been resequenced using the Illumina platform to a depth of >20X. These accessions are predominately TRJ material that have historic importance to USA rice. This sequence data will be compared to CGR and Nipponbare to identify genetic differences especially SNPs, InDels and chromosome rearrangements. CGR should be a superior reference when aligning sequencing reads from TRJ subspecies, particularly those generated by USA breeding programs. Indeed, preliminary results indicate ~2% more Illumina reads from TRJ lines align to CGR versus NPBR. Admixtures of TRJ and TEJ have a 1% improvement. Interestingly, even when aligning reads from TEJ lines, CGR serves as a comparable (<0.2% difference) reference relative to NPBR, suggesting that some of the alignment improvements are due not only to genetic relatedness but to more complete and/or accurate sequence assembly. Work is ongoing to annotate these differentially mapping reads. Once completed, all of data will be released to the public.