Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Soybean Genomics & Improvement Laboratory » Research » Publications at this Location » Publication #334369

Research Project: Defining the Genetic Diversity and Structure of the Soybean Genome and Applications to Gene Discovery in Soybean, Wheat and Common Bean Germplasm

Location: Soybean Genomics & Improvement Laboratory

Title: Integration of least angle regression with empirical Bayes for multi-locus genome-wide association studies

item ZHANG, JIN - Nanjing Agricultural University
item FENG, JIAN-YING - Nanjing Agricultural University
item WEN, YANG-JUN - Nanjing Agricultural University
item NIU, YUAN - Nanjing Agricultural University
item TAMBA, COX LWAKA - Nanjing Agricultural University
item YUE, CHAO - Nanjing Agricultural University
item Song, Qijian
item ZHANG, YUAN-MING - Nanjing Agricultural University

Submitted to: Heredity
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/20/2017
Publication Date: 3/15/2017
Citation: Zhang, J., Feng, J., Wen, Y., Niu, Y., Tamba, C., Yue, C., Song, Q., Zhang, Y. 2017. Integration of least angle regression with empirical Bayes for multi-locus genome-wide association studies. Heredity. 118:517-524.

Interpretive Summary: In plants, genes control traits such as seed yield and seed quality. Genes are found in chromosomes in positions called loci, and the genes are often associated with markers that mark their positions. The identification of these loci can be performed using statistics to evaluate markers in seedlings (offspring) derived from two parents. Unfortunately, the most common statistical methods are insufficient for analyzing many markers for many loci for many traits at the same time. In this study, we developed an algorithm to identify markers that were associated with different traits in each chromosome and to estimate the effects on the traits simultaneously. This approach was validated by the analyses of datasets from simulation experiments and a dataset containing seven flowering traits in a model plant, Arabidopsis. This new method can perform a statistical analysis for multiple traits and should be useful to scientists and breeders in the government, at universities or private companies who want to breed crops like soybean with better traits.

Technical Abstract: Multi-locus genome-wide association studies has become the state-of-the-art procedure to identify quantitative trait loci (QTL) associated with traits simultaneously. However, implementation of multi-locus model is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multi-locus genome-wide association studies. We proposed an algorithm for model transformation which could normalize the error from polygenic effect and residual and control polygenic background. Subsequently, markers on the same chromosome were simultaneously included in the model and the LARS algorithm was used to select a set of the most significant markers associated with quantitative traits while the markers on the other chromosomes of the genome were used to calculate kinship matrix. The selected markers in multi-locus model were examined for their association with traits by empirical Bayes and likelihood ratio test. Results from simulation studies showed that the new method with polygenic background control is more powerful in QTL detection and more accurate in QTL effect estimation, has less false positive rate, and requires less computing time than BhGLM, EMMA and the method without polygenic background control. Analyzing seven flowering time related traits in Arabidopsis thaliana confirmed that the new method is better than the EMMA for the detection of QTL.