Skip to main content
ARS Home » Pacific West Area » Davis, California » Nat'l Clonal Germplasm Rep - Tree Fruit & Nut Crops & Grapes » Research » Publications at this Location » Publication #318256

Title: YeATS- a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

Author
item CHAKRABORTY, SANDDEP - University Of California
item BRITTON, MONICA - University Of California
item WEGRZYN, JILL - University Of Connecticut
item BUTTERFIELD, TIMOTHY - University Of California
item MARTINEZ-GARCIA, PEDRO JOSE' - University Of California
item REAGAN, RUSSELL - University Of California
item RAO, BASUTHKAR - The Tata Institute Of Fundamental Research
item LESLIE, CHUCK - University Of California
item Aradhya, Mallikarjuna
item NEALE, DAVID - University Of California
item WOESTE, KEITH - Purdue University
item DANDEKAR, ABHAYA - University Of California

Submitted to: F1000Research
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/30/2015
Publication Date: 11/6/2015
Publication URL: http://www.ars-grin.gov.dav
Citation: Chakraborty, S., Britton, M., Wegrzyn, J., Butterfield, T., Martinez-Garcia, P., Reagan, R.L., Rao, B.J., Leslie, C.A., Aradhya, M.K., Neale, D., Woeste, K., Dandekar, A.M. 2015. YeATS- a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut. F1000Research. 4:155. doi: 10.12688/f100research.6617.2.

Interpretive Summary: The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.

Technical Abstract: The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.