Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Genetic Improvement for Fruits & Vegetables Laboratory » Research » Publications at this Location » Publication #310694

Title: Re-annotation of the woodland strawberry (Fragaria vesca) genome

Author
item DARWISH, OMAR - Towson University
item SHAHAN, RACHEL - University Of Maryland
item LIU, ZHONGCHI - University Of Maryland
item Slovin, Janet
item ALKHAROUF, NADIM - Towson University

Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/5/2015
Publication Date: 1/27/2015
Publication URL: http://handle.nal.usda.gov/10113/61439
Citation: Darwish, O., Shahan, R., Liu, Z., Slovin, J.P., Alkharouf, N. 2015. Re-annotation of the woodland strawberry (Fragaria vesca) genome. Biomed Central (BMC) Genomics. 16:29.

Interpretive Summary: The genome sequence of the woodland strawberry was published a few years ago, however the work to determine where the genes are in this sequence is a continuing process. Knowing where the genes are in the genome and what their sequences are, is necessary for further investigations into how these genes work to make a strawberry taste and look good, store well, and provide healthful nutrients. In this report we describe the use of several computer programs together with recently published strawberry gene sequences to find over 2,000 new genes in the strawberry genome. This information is critical to further studies of strawberry molecular biology, and will be highly useful for scientists and breeders who work on strawberry and related crops.

Technical Abstract: Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011. The first generation annotation (version 1.1) were developed using GeneMark-ES+, which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1. The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae(http://www.rosaceae.org/). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We experimentally verified one of the new gene model predictions to validate our results. Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomics studies of the strawberry and other members of the Rosaceae.