Project : USDA ARS

ARS Home » Plains Area » Clay Center, Nebraska » U.S. Meat Animal Research Center » Genetics and Animal Breeding » Research » Research Project #433845

Research Project: Identifying Genomic Solutions to Improve Efficiency of Swine Production

Location: Genetics and Animal Breeding

2019 Annual Report

Objectives
Objective 1: Utilize next-generation sequencing technologies to improve the contiguity of the swine genome assembly and better characterize genomic variation in pigs. Subobjective 1.A: Utilize segregation analysis to improve the porcine genome assembly. Subobjective 1.B: Develop more comprehensive gene models for the swine genome. Subobjective 1.C: Develop an electronic warehouse of genomic variants that can be utilized by the swine genomics research community. Objective 2: Develop genotyping products for commercial swine producers to increase the efficiency of swine production. Subobjective 2.A: Identify predictive genetic markers for traits associated with production efficiency in commercial swine populations. Subobjective 2.B: Develop strategies for inclusion of predictive markers in selection programs.

Approach
The principal goals of this project are to enhance our understanding of the biological processes important to swine production and provide the U.S. swine industry with genetic tools that will ensure that it remains the global leader in providing safe, nutritious, and economic pork products. The swine industry has been faced with significant challenges, many of which revolve around the production and performance of feeder pigs. The environment in which females are housed is continually evolving, and the increasing cost of feed has resulted in continuous shifts in the utilization of feed stuffs. Each new challenge requires an assessment of potential solutions. Genetic selection can be used to address many production issues. If DNA variants associated with changes in phenotype can be identified, then marker assisted selection can be implemented to expedite genetic progress. Predictive genetic markers need to be transferred to commercial entities that will rapidly evaluate and adopt them. The increasing improvements to the porcine genome, better annotations of genes from model organisms, and enhanced bioinformatics technologies provide researchers with the necessary tools to identify functional genetic variants. Objective 1 focuses on improving the porcine genome assembly and detecting polymorphisms from data generated by next-generation sequencing. Objective 2 will strive to effectively transfer the results of the research from Objective 1 to producers. Development of marker panels along with economical genotyping platforms will be essential. Our research will focus on the evaluation of genetic markers based on their predicted effects on gene products to discover causal genetic variants of phenotypic variation. This will lead to the development of marker panels and economical genotyping platforms for industry applications.

Progress Report
As reported in FY2018, a total of 240 animals from our resource populations have been sequenced using short-read technologies to detect segregating genetic variation. As part of the development of a variant warehouse for commercial pigs (Objective 1C), copy number variable regions (CNVR) in the porcine genome were identified from whole genome sequence data. Using a combination of split reads, paired-end mapping, and read depth approaches, we identified a total of 3,538 CNVR, including 1,820 novel CNVR not reported in previous studies. The CNVR covered 0.94% of the porcine genome and overlapped 1,401 genes. Gene ontology analysis identified that CNV-overlapped genes were enriched for functions related to organism development and overlapped many known quantitative trait loci (QTL). Analysis of QTL previously identified in the U.S. Meat Animal Research Center (USMARC) herd showed that CNVR were significantly enriched for reproductive QTL, specifically traits such as age of puberty and ovulation rate. To develop more comprehensive porcine gene models (Objective 1B), sequencing of the white blood cell transcriptome was conducted. A total of 392 RNA-Seq libraries (196 animals at 2 time points) were constructed from white blood cell tissue from pigs in a feed efficiency trial. Paired-end sequencing was carried out for each of the libraries on an Illumina NextSeq500 instrument, resulting in approximately 17.5 billion sequence reads being generated across the libraries. RNA-Seq libraries from the hypothalamus tissue of 30 pigs with divergent feed efficiency phenotypes were sequenced (Objective 1B) resulting in over two billion 75-bp paired-end reads using an Illumina NextSeq instrument. The range of raw sequence reads per sample was 53.97 million to 84.51 million, with an average of 67.14 million reads per sample. The high quality reads were mapped to the Sscrofa 11.1 genome assembly with an average 96.91% overall mapping rate. Computing read counts for each gene and filtering out genes with low read counts resulted in a set of 17,036 expressed genes. Utilizing a meta-analysis procedure, a total of 103 genes were identified to be significantly differentially expressed between feed efficiency phenotypes. Pathway analysis detected that differentially expressed genes were significantly enriched in 28 pathways, including G-protein signaling mediated by Tubby, Gai signaling, and androgen signaling. RNA-Seq technology can be categorized into three subclasses according to the types of RNA sequenced: messenger RNA (mRNA), micro RNA (miRNA), and total RNA. To date, the most popular type of RNA-Seq technology has been mRNA sequencing, focusing on the expression of protein coding genes. However, in recent years, the mRNA-centric paradigm of the transcriptome landscape has shifted to include noncoding regions of the genome. Total RNA-Seq was performed on libraries originating from three porcine tissues, 4 longissimus dorsi muscle samples, 4 liver samples, and 8 hypothalamus samples (Objective 1B). To evaluate the appropriate depth of sequence needed for total RNA-Seq transcriptome profiling, a random downsampling method was used to generate different sequencing depths from each of the libraries, and the transcriptome profiles from the various sequencing depths were compared. Small coefficients of variation in expression values across technical replicates indicated that the sampling procedure was consistent and accurate. As expected, there was higher variability in the re-sampling for lowly expressed transcripts, i.e. those in the first and second quartile, and in samples with lower re-sampling depth. Saturation curves suggest that in all three tissues a depth of 80 million (M) reads is sufficient to capture most transcripts as increases in numbers of identified transcripts become relatively small after 80 M reads. Saturation in the number of annotated transcripts identified also occurred at a depth of 80 M reads. Collection of pubertal phenotypes and genotypes from non-cycling gilts after 240 days of age has continued (Objective 2A). Currently there are nearly 1400 records for non-cycling gilts and controls (667 delayed puberty and 608 behavioral anestrus gilts). Tissues for RNAseq analysis have been collected from Yorkshire and Landrace sired females. High-density genotyping for non-cycling gilts and cycling controls has been completed or is in progress for all animals with records. RNA sequencing of amygdala and olfactory bulb from delayed puberty, behavioral anestrus and the appropriate control gilts has been completed (Objective 1B). RNA-seq analysis has been completed for olfactory bulb and is underway for amygdala. RNA-seq libraries are being constructed from medial basal hypothalamus and hippocampal tissues. RNA-Seq libraries were constructed from olfactory bulb (OB) from gilts not showing estrus by 240 days and normal cycling gilts. Non-cycling gilts were subdivided into delayed puberty (no ovulation or estrus) and behavioral anestrus (ovulation but no estrus event). Over 17,500 genes were expressed in the olfactory bulb at measurable levels and differential gene expression (DGE) of OB showed a small number of genes differentially expressed between non-cycling gilts and controls, while normal cycling gilts had over 2,000 genes differentially expressed depending upon their stage of the estrus cycle. Phenotypic data for feed efficiency in grow-finish has been evaluated (Objective 2A). Records were evaluated for inconsistent metrics due to equipment error and a procedure to recover the information developed. Briefly, all suspect data were eliminated for days where a feeder wasn’t performing accurately and then missing information was interpolated using a random regression model. Genotyping has also been completed for more than 3,000 phenotyped pigs for this study. Association analyses have been conducted for feed intake and analyses for feed conversion are underway. Typically, in genome-wide association studies (GWAS), the number of genotyped individuals is 10 to 100 fold less than the number of genetic markers being tested. Marker selection is a statistical procedure that is often employed to reduce the number of genetic markers in the analysis in order to ensure the statistical results can be interpreted. A novel marker selection methodology was developed (Objective 2A). In this method, gene expression data from RNA-Seq of multiple tissues was used as prior information to assign weights to single nucleotide polymorphisms (SNP), SNP are weighted based on a weight threshold, and weighted hypothesis testing is used to conduct a GWAS. RNA-Seq libraries from hypothalamus, duodenum, ileum, and jejunum tissue of 30 pigs with divergent feed efficiency phenotypes were sequenced, and a three-way gene x individual x tissue clustering analysis was performed, using constrained tensor decomposition, to obtain a total of 10 gene expression modules. Loading values from each gene module were used to assign weights to 49,691 commercial SNP markers, and SNP were selected using a weight threshold, resulting in 10 SNP sets ranging in size from 101 to 955 markers. Weighted GWAS for feed intake in 4,200 pigs was performed separately for each of the 10 SNP sets. A total of 36 unique significant SNP associations were identified across the ten gene modules (SNP sets). For comparison, a standard unweighted GWAS using all 49,691 SNP was performed, and only 2 SNP were significant. None of the SNP from the unweighted analysis resided in known QTL related to swine feed efficiency (feed intake, average daily gain, and feed conversion ratio) compared to 29 (80.6%) in the weighted analyses, with 9 SNP residing in feed intake QTL. With the release of SowPro90 genotyping platform (Objective 2B), we have genotyped 384 USMARC animals. Several of these animals have been genotyped with previous platforms and concordance between genotypes from different platforms was high. To improve this genotyping product, we have worked with our collaborators to remove assays that were uninformative and submitted additional marker sequences for inclusion on a revised product.

Accomplishments
1. Weights derived from gene expression improves genetic marker association analysis. Marker selection is a statistical procedure that can be employed in genome-wide association studies to reduce the number of genetic markers in the analysis, ensuring statistical results can be interpreted. ARS scientists at Clay Center, Nebraska, developed methodology that uses prior information from gene expression to rank genomic regions and perform marker selection for genome-wide association studies (GWAS) and demonstrated its utility to identify genes associated with feed intake in swine. Gene expression data from four tissues of high and low feed efficiency pigs was used to select less than 1,000 markers from a set of approximately 50,000 commercially available markers. A GWAS utilizing the new methods identified 36 significant markers in contrast to a traditional method which identified only 2. These results show that prioritizing genetic markers, based on gene expression data across multiple tissues, improves the power of association analysis by identifying critical markers that do not individually attain genome-wide significance.

2. SowPro90, a high-density genotyping platform for swine containing functional variants. High-density genotyping platforms in swine rely on evenly-spaced single nucleotide polymorphisms (SNP) selected for their information content. Characterization of genetic variants in commercial pigs by ARS scientists has resulted in the identification of several thousand SNP presumed to significantly alter gene products. ARS scientists at Clay Center, Nebraska, in collaboration with University of Nebraska scientists, a genotyping product for swine was created that contains >90,000 SNP, targeting over 4,000 genes and containing 676 loss of function variants. Gene targets were selected from association studies of reproductive traits and disease resistance. This genotyping product provides swine producers with a powerful new tool for genetic marker informed selection based on functional variants for critically important traits of high pork industry priority.

U.S. DEPARTMENT OF AGRICULTURE

Genetics and Animal Breeding: Clay Center, NE