2011 Annual Report
1a.Objectives (from AD-416)
When specific marker alleles segregate together with alleles at loci that affect the phenotype, the marker can explain and predict phenotypic variation. It may also occur that combinations of alleles at different markers (that is haplotypes) co-segregate more reliably with causal alleles, so that haplotypes are more effective at prediction. In this context, we will determine the ranges of the effective population size, the age of the causal mutation, and the marker density parameters for which haplotype methods are superior to single marker methods. We will also assess whether different haplotype block identification methods differently affect the performance of QTL detection methods in real and simulated data.
Beyond identifying QTL, these marker data can be used to predict a genotype’s performance. We will evaluate analysis methods that use haplotypes for this purpose. Finally, to perform these analyses in practice, large amounts of DNA marker data are needed. We will take advantage of Cornell expertise to develop lower cost methods of obtaining marker data using sequencing.
1b.Approach (from AD-416)
Populations under a Wright-Fisher neutral model will be simulated using a standard coalescent approach with a range of effective population sizes thought to correspond to the effective population sizes of elite small grain crops in North America (Ne = 25 to 400). A polymorphism of the appropriate age (g = 25 to 400 generations) will be selected and effects will be attributed to its alleles. Four hundred individuals will be simulated in this way. These data will then be subjected to single marker and haplotype block analyses. Since the methods use different test statistics, their power will be assessed on the basis of detection power at fixed false discovery rates. Whole chromosomes will also be simulated and populated with one to several causal polymorphisms simulating a locus bearing several mutations and generating an allelic series. Different haplotype block identification methods will be applied to the whole chromosome marker profile. Chromosomes will also be simulated in structured populations. Finally, these analyses will also be applied to real marker and phenotype data from the Barley Coordinated Agricultural Project.
Similar approaches can be used to compare performance prediction models rather than QTL detection models. For marker development and scoring through sequencing, we will use subsets of lines from bi-parental populations in barley and wheat. These lines will be sequenced on Cornell machines and progeny sequence compared to parental sequence.
Work performed at Cornell in support of bioinformatics methods to predict small grain field performance has included two projects. In the first, simulation studies were performed to assess the power of identifying associations between markers and traits when more than one marker are joined as a single "haplotype" predictor rather than looking at markers individually. Simulations were performed so as to mimic a number of different population histories of the experimental lines being evaluated. We found that except in the most simple (and unrealistic) population history case, the haplotype methods outperformed the single marker methods, though the difference was small. In the second project, we are developing multi-trait methods to make predictions of performance using DNA marker data only. In using those methods, we hope to leverage information in traits that are strongly affected by genotype and that are correlated to traits that are more strongly affected by environment.
Progress will be monitored by monthly meetings in addition to phone calls, emails and/or conference calls as needed.