|CHENG, JIAN - North Carolina State University|
|MALTECCA, CHRISTIAN - North Carolina State University|
|O'CONNELL, JEFFREY - University Of Maryland School Of Medicine|
|MA, LI - University Of Maryland|
|JIANG, JICAI - University Of Maryland|
Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 3/10/2022
Publication Date: 6/13/2022
Citation: Cheng, J., Maltecca, C., Van Raden, P.M., O'Connell, J.R., Ma, L., Jiang, J. 2022. SLEMM: Million-scale genomic best linear unbiased predictions with window-based SNP weighting [abstract]. Journal of Dairy Science. 105(Suppl. 1):19(abstr. 1048).
Technical Abstract: Genomic data is exploding in animal breeding. Using a large number of genotyped and phenotyped animals for genomic predictions is appealing yet challenging. The genomic best linear unbiased prediction (GBLUP) model and various SNP-based Bayesian alphabet models such as Bayes R remain widely popular for genomic prediction. The Bayesian models are typically advantageous for traits that have genes of large effect. However, the Markov chain Monte Carlo (MCMC) sampling method often used by Bayesian models is time-consuming. Here we present an alternative approach in a framework of multi-step evaluation for million-scale genomic predictions, which we refer to as SSGP. Unlike MCMC, SSGP relies on an efficient implementation of the stochastic Lanczos algorithm for REML and BLUP. We further develop a window-based SNP weighting method to improve prediction accuracy. SSGP was compared with GBLUP and Bayes R in terms of prediction accuracy. Extensive data analyses, covering a spectrum of polygenic traits in multiple plant and animal species, show that SSGP had comparable accuracies with Bayes R (0.3% lower than Bayes R and 3% greater than GBLUP for animals; 2% greater than Bayes-R and 0.4% lower than GBLUP for plants, where most traits are highly polygenic). SSGP was further applied on a large-scale Holstein cow data set (5 milk production traits from about 300K animals with 60K SNPs) from the Council on Dairy Cattle Breeding. Prediction accuracies using SSGP were consistently greater than using Bayes R (0.1 to 2% greater) and GBLUP (0.3 to 1% greater). Simulation analyses show that SSGP can complete genomic predictions for 0.5M genotyped animals and 50K SNPs in ~0.4 hours with 9 GB of memory while Bayes R used ~6.6 hours with 24.5 GB of memory. SSGP used ~5.5 hours and 63 GB of memory for prediction 3M animals while Bayes R can only predict for number of animals no more than 0.5M in this case. In short, SSGP paves the way for million-scale genomic predictions. Further comparison with single-step GBLUP will be evaluated. SSGP is freely available at https://github.com/jiang18/ssgp.