Population Genetic Research in Support of Bioinformatic Methods to Predict Small Grain Field Performance
Plant, Soil and Nutrition Research
2012 Annual Report
1a.Objectives (from AD-416):
When specific marker alleles segregate together with alleles at loci that affect the phenotype, the marker can explain and predict phenotypic variation. It may also occur that combinations of alleles at different markers (that is haplotypes) co-segregate more reliably with causal alleles, so that haplotypes are more effective at prediction. In this context, we will determine the ranges of the effective population size, the age of the causal mutation, and the marker density parameters for which haplotype methods are superior to single marker methods. We will also assess whether different haplotype block identification methods differently affect the performance of QTL detection methods in real and simulated data.
Beyond identifying QTL, these marker data can be used to predict a genotype’s performance. We will evaluate analysis methods that use haplotypes for this purpose. Finally, to perform these analyses in practice, large amounts of DNA marker data are needed. We will take advantage of Cornell expertise to develop lower cost methods of obtaining marker data using sequencing.
1b.Approach (from AD-416):
Populations under a Wright-Fisher neutral model will be simulated using a standard coalescent approach with a range of effective population sizes thought to correspond to the effective population sizes of elite small grain crops in North America (Ne = 25 to 400). A polymorphism of the appropriate age (g = 25 to 400 generations) will be selected and effects will be attributed to its alleles. Four hundred individuals will be simulated in this way. These data will then be subjected to single marker and haplotype block analyses. Since the methods use different test statistics, their power will be assessed on the basis of detection power at fixed false discovery rates. Whole chromosomes will also be simulated and populated with one to several causal polymorphisms simulating a locus bearing several mutations and generating an allelic series. Different haplotype block identification methods will be applied to the whole chromosome marker profile. Chromosomes will also be simulated in structured populations. Finally, these analyses will also be applied to real marker and phenotype data from the Barley Coordinated Agricultural Project.
Similar approaches can be used to compare performance prediction models rather than QTL detection models. For marker development and scoring through sequencing, we will use subsets of lines from bi-parental populations in barley and wheat. These lines will be sequenced on Cornell machines and progeny sequence compared to parental sequence.
Work performed at Cornell in support of bioinformatics methods to predict small grain field performance has included two projects. In the first, we developed multi-trait methods to make predictions of performance using DNA marker data only. We found that information from traits with high heritability can improve prediction of correlated traits with low heritability. Improvement of prediction in the opposite direction (from traits of low to traits of high heritability) is negligible. The improvement increased as the genetic correlation between traits increased, though it was evident even at low genetic correlation. We found that improvement in prediction could even occur when both traits had not been measured in the same individuals. This finding opens the option for expanding the number of breeding lines evaluated by evaluating different traits on different lines. A revised manuscript on these results has been submitted. In the second project, we are working directly with the collaborator’s breeding program as a test case for the use of genomic selection in a small, public sector breeding program. We are seeking to optimize the use of both phenotyping and genotyping resources in this program and will contrast methods used before and after conversion of the program.