|HARRIS, ZACHARY - Missouri State University|
|KOVACS, LASZLO - Missouri State University|
Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/22/2017
Publication Date: 12/2/2017
Citation: Harris, Z., Kovacs, L., Londo, J.P. 2017. RNASeq-based genome annotation and identification of long-noncoding RNAs in the grapevine cultivar 'Riesling'. Biomed Central (BMC) Genomics. 18:937.
Interpretive Summary: The grapevine genome has been sequenced and a prediction of all the functional genes (transcriptome) is available for the inbred cultivar PN40024, derived from V. vinifera 'Pinot Noir'. This reference genome and transcriptome is invaluable for research on grapevine, providing a foundation for identifying and describing different genes and traits. However, it represents a single cultivar and studies have demonstrated that there are large differences between different grapevine cultivars in genome organization and transcriptome. This study was conducted to examine the transcriptome of the cultivar 'Riesling' as well as to try and understand portions of the genome that are expressed, but are not considered genes; Long Non-Coding RNAs (lncRNA). In this study we developed a new computational pipeline to search through expressed data and identify these lncRNA. Using RNA sequences found in leaf, flower, rachis, berry, root, and bud tissues, we describe the patterns of 'Riesling' genes and have also identified hundreds of gene regions that result in lncRNA. This new database of expressed genes and expressed lncRNA can now be used to identify 'Riesling' specific responses to environmental stress and pathogen attack, as well as identify novel genes for future grapevine breeding efforts. Additionally, the software pipeline described in this study will be free to use by other researchers looking for lncRNA in grapevine.
Technical Abstract: The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in heterozygous species. This is a promising approach to improving the annotation of the reference genome sequence of grapevine (Vitis vinifera L.), a species of high-level heterozygosity. This work is an attempt to enhance annotation the V. vinfera PN40024-derived reference genome sequence based on the de novo-assembled transcriptome of the V. vinifera cultivar 'Riesling'. Here we show that the transcriptome assembly of a single V. vinifera cultivar is insufficient for a complete genome annotation of the reference PN40024 genome. Further, we provide evidence that the gene models we identified cannot be completely anchored to the previously published PN40024 gene models. In addition to these finding, we present a computational pipeline for the de novo identification of lncRNAs. Our results demonstrate that, in grapevine, lncRNAs are significantly different from protein coding transcripts in such metrics as length, GC-content, minimum free energy, and length-corrected minimum free energy. In grapevine, the high level of heterozygosity necessitates that transcriptome characterization be based on cultivar-specific reference genome sequences. Our results strengthen the hypothesis that lncRNAs have thermodynamically different properties than protein-coding RNAs. The analyses of both coding and non-coding RNAs will be instrumental in uncovering inter-cultivar variation in wild and cultivated grapevine species.