Location: Genetics and Animal Breeding
Title: Global analysis of transcription start sites in the new ovine reference genome (Oar rambouillet v1.0)Author
![]() |
SALAVATI, MAZDAK - Roslin Institute |
![]() |
CAULTON, ALEX - Agresearch |
![]() |
CLARK, RICHARD - University Of Edinburgh |
![]() |
GAZOVA, IVETA - Roslin Institute |
![]() |
Smith, Timothy |
![]() |
WORLEY, KIM - Baylor College Of Medicine |
![]() |
COCKETT, NOELLE - Utah State University |
![]() |
ARCHIBALD, ALAN - Roslin Institute |
![]() |
CLARKE, SHANNON - Agresearch |
![]() |
MURDOCH, BRENDA - University Of Idaho |
![]() |
CLARK, EMILY - Roslin Institute |
Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 9/9/2020 Publication Date: 10/23/2020 Citation: Salavati, M., Caulton, A., Clark, R., Gazova, I., Smith, T.P.L., Worley, K.C., Cockett, N.E., Archibald, A.L., Clarke, S., Murdoch, B.M., Clark, E.L. 2020. Global analysis of transcription start sites in the new ovine reference genome (Oar rambouillet v1.0). Frontiers in Genetics. Article e580580. https://doi.org/10.3389/fgene.2020.580580. DOI: https://doi.org/10.3389/fgene.2020.580580 Interpretive Summary: The Ovine "Functional Annotation of Animal Genomes" (FAANG) consortium has the goal to identify the regions of the genome that affect the expression of genes. Many of these regions, known as "control elements", are tied to segments of DNA close to where the DNA begins conversion to RNA for each gene, called "transcription start sites" (TSS). We therefore need to know where the TSS lie in the genome in order to correctly identify control elements. This work uses a technique called CAGE that "captures" the TSS by finding the ends of the RNA made from each gene, and matching it to the genome. Our CAGE analysis identified nearly 30,000 high-confidence TSS in the ovine genome, providing a critical resource for identification of genetic elements controlling gene expression in sheep tissues. Technical Abstract: The overall aim of the Ovine FAANG project is to provide a comprehensive annotation of the new highly contiguous sheep reference genome sequence (Oar rambouillet v1.0). Mapping of transcription start sites (TSS) is a key first step in understanding transcript regulation and diversity. Using 56 tissue samples collected from the reference ewe Benz2616 we have performed a global analysis of TSS and TSS-Enhancer clusters using Cap Analysis Gene Expression (CAGE) sequencing. CAGE measures RNA expression by 5’ cap-trapping and has been specifically designed to allow the characterization of TSS within promoters to single-nucleotide resolution. We have adapted an analysis pipeline that uses TagDust2 for clean-up and trimming, Bowtie2 for mapping, CAGEfightR for clustering and the Integrative Genomics Viewer (IGV) for visualization. Mapping of CAGE tags indicated that the expression levels of CAGE tag clusters varied across tissues. Expression profiles across tissues were validated using corresponding polyA+ mRNA-Seq data from the same samples. After removal of CAGE tags with < 10 read counts, 39.3% of TSS overlapped with 5’ ends of transcripts, as annotated previously by NCBI. A further 14.7% mapped to within 50bp of annotated promoter regions. Intersecting these predicted TSS regions with annotated promoter regions ('50bp) revealed 46% of the predicted TSS were ‘novel’ and previously un-annotated. Using whole genome bisulphite sequencing data from the same tissues we were able to determine that a proportion of these ‘novel’ TSS were hypo-methylated (32.2%) indicating that they are likely to be reproducible rather than ‘noise’. This global analysis of TSS in sheep will significantly enhance the annotation of gene models in the new ovine reference assembly. Our analyses provide one of the highest resolution annotations of transcript regulation and diversity in a livestock species to date. |