Location: Genetics and Animal BreedingTitle: Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data
|BEIKI, HAMID - Iowa State University|
|LIU, HAIBO - Iowa State University|
|MANCHANDA, NANCY - Iowa State University|
|Nonneman, Danny - Dan|
|Smith, Timothy - Tim|
|REECY, JAMES - Iowa State University|
|TUGGLE, CHRISTOPHER - Iowa State University|
Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/17/2019
Publication Date: 5/7/2019
Citation: Beiki, H., Liu, H., Manchanda, N., Nonneman, D.J., Smith, T.P.L., Reecy, J., Tuggle, C. 2019. Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data. BMC Genomics. 20:344-362. https://doi.org/10.1186/s12864-019-5709-y.
Interpretive Summary: Pork is the most widely consumed meat in the world and domestic pigs are closely related to humans in terms of anatomy, genetics and physiology and provide animal models in many fields of biomedical research. Genome sequencing and gene annotation of the pig has been a major accomplishment to improve swine production through the use of molecular genetics, however annotation and gene discovery is still incomplete. In order to improve the catalog of expressed genes and their different isoforms, a combination of long-read sequencing paired with higher depth short-read sequencing for error correction was performed on nine physiologically relevant tissues (brain, hypothalamus, liver, muscle, thymus, pituitary, small intestine, spleen and diaphragm). This work identified over 40,000 total genes including 10,000 novel genes and over 24,500 long non-coding RNAs that act as regulators of gene expression. Thousands of novel transcripts were identified and thousands of known gene borders were extended. The number of genes identified is consistent with other species, such as human, and will significantly improve the current pig genome annotations.
Technical Abstract: Background: Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig. Results: Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted proteincoding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were identified; an increase over current EBI (1.9 tpg) and NCBI (2.9 tpg) annotations and closer to the number reported in human genome (4.2 tpg). Our new pig genome annotation extended more than 6000 known gene borders (5' end extension, 3' end extension, or both) compared to EBI or NCBI annotations. We validated a large proportion of these extensions by independent pig poly(A) selected 3'-RNAseq data, or human ANTOM5 Cap Analysis of Gene Expression data. Further, we detected 10,465 novel genes (81% non-coding) not reported in current pig genome annotations. More than 80% of these novel genes had transcripts detected in > 1 tissue. In addition, more than 80% of novel intergenic genes with at least one transcript detected in liver tissue had H3K4me3 or H3K36me3 peaks mapping to their promoter and gene body, respectively, in independent liver chromatin immunoprecipitation data. Conclusions: These validated results show significant improvement over current pig genome annotations.