Location: Genetics and Animal BreedingTitle: Using SNP weights derived from gene expression modules to improve GWAS power for feed efficiency in pigs
Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/9/2019
Publication Date: 1/21/2020
Citation: Keel, B.N., Snelling, W.M., Lindholm-Perry, A.K., Oliver, W.T., Kuehn, L.A., Rohrer, G.A. 2020. Using SNP weights derived from gene expression modules to improve GWAS power for feed efficiency in pigs. Frontiers in Genetics. 10:1339. https://doi.org/10.3389/fgene.2019.01339.
Interpretive Summary: Typically, in genome-wide association studies, the number of genotyped individuals is in the hundreds or thousands and the number of genetic markers being tested is in the hundreds of thousands. This scenario poses a significant challenge in statistical inference, making the analysis and interpretation of the study quite difficult. Marker selection is a statistical procedure that is often employed to reduce the number of genetic markers in the analysis in order to ensure the statistical results can be interpreted. ARS scientists have developed a methodology that uses prior information to rank genomic regions and perform marker selection for genome-wide association studies. Gene expression data from four tissues of high and low feed efficiency pigs was used to select less than 1,000 markers from a set of approximately 50,000 commercially available markers. A genome-wide association study was conducted using this subset of markers, and 36 markers were found to be significantly associated with swine feed efficiency compared to only 2 markers identified in a standard association analysis using all 50,000 markers. Neither of the markers from the standard analysis resided in known genomic regions related to swine feed efficiency (feed intake, average daily gain, and feed conversion ratio) compared to 29 (80.6%) in the other analysis. These results suggest that a considerable proportion of heritability of feed intake is driven by many markers that individually do not attain genome-wide significance in a standard association analysis, but are able to be identified after marker selection. Hence, the proposed procedure for prioritizing genetic markers based on gene expression data across multiple tissues provides a promising approach for improving the power of association analysis.
Technical Abstract: The "large p small n" problem has posed a significant challenge in the analysis and interpretation of genome-wide association studies (GWAS). The use of prior information to rank genomic regions and perform SNP selection could increase the power of GWAS. In this study, we propose the use of gene expression data from RNA-Seq of multiple tissues as prior information to assign weights to SNP, select SNP based on a weight threshold, and utilize weighted hypothesis testing to conduct a GWAS. RNA-Seq libraries from hypothalamus, duodenum, ileum, and jejunum tissue of 30 pigs with divergent feed efficiency phenotypes were sequenced, and a three-way gene x individual x tissue clustering analysis was performed, using constrained tensor decomposition, to obtain a total of 10 gene expression modules. Loading values from each gene module were used to assign weights to 49,691 commercial SNP markers, and SNP were selected using a weight threshold, resulting in 10 SNP sets ranging in size from 101 to 955 markers. Weighted GWAS for feed intake in 4,200 pigs was performed separately for each of the 10 SNP sets. A total of 36 unique significant SNP associations were identified across the ten gene modules (SNP sets). For comparison, a standard unweighted GWAS using all 49,691 SNP was performed, and only 2 SNP were significant. None of the SNP from the unweighted analysis resided in known QTL related to swine feed efficiency (feed intake, average daily gain, and feed conversion ratio) compared to 29 (80.6%) in the weighted analyses, with 9 SNP residing in feed intake QTL. These results suggest that the heritability of feed intake is driven by many SNP that individually do not attain genome-wide significance in GWAS. Hence, the proposed procedure for prioritizing SNP based on gene expression data across multiple tissues provides a promising approach for improving the power of GWAS.