Skip to main content
ARS Home » Research » Publications at this Location » Publication #208311

Title: Efficient estimation of breeding values from dense genomic data

item Vanraden, Paul

Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 3/7/2007
Publication Date: 7/8/2007
Citation: Van Raden, P.M. 2007. Efficient estimation of breeding values from dense genomic data. Journal of Dairy Science. 90(Suppl. 1):374––375 (abstr. 414).

Interpretive Summary:

Technical Abstract: Genomic, phenotypic, and pedigree data can be combined to produce estimated breeding values (EBV) with higher reliability. If coefficient matrix Z includes genotypes for many loci and marker effects (u) are normally distributed with equal variance at each, estimation of u by mixed model equations or EBV by selection index equations that include a genomic relationship matrix (G) are equivalent models. Matrix G is analogous to traditional relationship matrix A and is obtained by subtracting allele frequencies from coefficients of Z and then dividing the revised Z Z’ by the number of marker effects (m). Equations that include either Z’Z or Z Z’ are dense and can be solved by several methods tested on simulated data. Off-diagonals count individuals that inherited two different alleles (in Z’Z) or alleles shared by two individuals (in Z Z’). Algorithms that estimate marker effects first using Z’Z and then sum to obtain EBV are more efficient than those that use Z Z’ unless m greatly exceeds the number of genotyped individuals (n). With direct inversion to obtain reliabilities, computing times increase by n cubed for EBV or m cubed for marker effects. With iteration to estimate u, computing times increase with the number of iterations (i) times m squared. The algorithm known as iteration on data reduces memory, and a simple trick can increase speed. For each individual, its genotypes (left-hand sides) are multiplied by previous round estimates and this sum minus the diagonal coefficient is used to adjust right-hand sides instead of summing off-diagonals times previous solutions again for each effect. Computing time is linear with number of effects in the model (not quadratic as in many previous algorithms) and linear with total number of genotypes, increasing with i times n times m. More iterations and under-relaxation were required for convergence as m increased. The methods include only phenotypes (or daughter deviations) of genotyped individuals, but future algorithms ideally should also include phenotypes of un-genotyped individuals, perhaps by absorbing equations for marker effects into equations for EBV.