Location: Endemic Poultry Viral Diseases ResearchTitle: Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing
|GUAN, DAILU - University Of California, Davis|
|HALSTED, MICHELLE - University Of California, Davis|
|ISLAS-TREJO, ALMA - University Of California, Davis|
|GOSZCZYNSKI, DANIEL - University Of California, Davis|
|ROSS, PABLO - University Of California, Davis|
|ZHOU, HUAIJUN - University Of California, Davis|
Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/30/2022
Publication Date: 9/30/2022
Citation: Guan, D., Halsted, M.M., Islas-Trejo, A.D., Goszczynski, D.E., Cheng, H.H., Ross, P.J., Zhou, H. 2022. Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing. Frontiers in Genetics. 13:997460. https://doi.org/10.3389/fgene.2022.997460.
Interpretive Summary: A major biological goal is to associate variation in the genome of an organism with phenotypic (trait) variation. This is no different for chicken where the first genome assembly was released in 2004. A key effort is to identify all the transcripts (RNAs) that are generated with respect to development age and tissue. In this submission, 19 different tissues from ARS experimental birds were sequenced using a new technology that provides more bases that can be read for each RNA molecule. By mapping these reads to the chicken genome followed by computational analyses, over 74,000 transcripts were identified with approximately 40% not previously known. This effort greatly increases the power of the chicken genome, which will ultimately result in more accurate methods to bred and rear poultry.
Technical Abstract: To identify and annotate transcript isoforms in the chicken genome, we generated Nanopore long-read sequencing data from 68 samples that encompassed 19 diverse tissues collected from experimental adult male and female White Leghorn chickens. More than 23.8 million reads with mean read length of 790 bases and average quality of 18.2 were generated. The annotation and subsequent filtering resulted in the identification of 55,382 transcripts at 40,547 loci with mean length of 1,700 bases. We predicted 30,967 coding transcripts at 19,461 loci, and 16,495 lncRNA transcripts at 15,512 loci. Compared to existing reference annotations, we found ~52% of annotated transcripts could be partially or fully matched while ~47% were novel. Seventy percent of novel transcripts were potentially transcribed from lncRNA loci. Based on our annotation, we quantified transcript expression across tissues and found two brain tissues (i.e., cerebellum and cortex) expressed the highest number of transcripts and loci. Furthermore, ~22% of the transcripts displayed tissue specificity with the reproductive tissues (i.e., testis and ovary) exhibiting the most tissue-specific transcripts. Despite our wide sampling, ~20% of Ensemble reference loci were not detected. This suggests that deeper sequencing and additional samples that include different breeds, cell types, developmental stages, and physiological conditions, are needed to fully annotate the chicken genome. The application of Nanopore sequencing in this study demonstrates the usefulness of long-read data in discovering additional novel loci (e.g., lncRNA loci) and resolving complex transcripts (e.g., the longest transcript for the TTN locus).