Location: Plant Genetics ResearchTitle: Comparing different statistical models and multiple testing corrections for association mapping in soybean and maize
|KALER, AVJINDER - University Of Arkansas|
|BEISSINGER, TIMOTHY - Georg August University|
|PURCELL, LARRY - University Of Arkansas|
Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/23/2019
Publication Date: 2/25/2020
Citation: Kaler, A.S., Gillman, J.D., Beissinger, T., Purcell, L.C. 2020. Comparing different statistical models and multiple testing corrections for association mapping in soybean and maize. Frontiers in Plant Science. 10:1794. https://doi.org/10.3389/fpls.2019.01794.
Interpretive Summary: Numerous methods to correlate phenotypic traits with genomic regions (or genes) have been proposed since the rediscovery of Mendel’s work in the early 20th century. One of the most widespread methods, association mapping, has been shown to be very effective at identifying common genes which control traits of interest. Despite demonstrated utility, association mapping sometimes suffers from high levels of both false positive and false negative results. We used two large public genetic/phenotypic datasets to test a variety of association mapping methods in two different plant species (corn and soybean) which differ in their outcrossing/inbreeding and overall genetic diversity. We examined six quantitative traits, and this approach allowed us to identify one programming method which performed far better than all other tested methods. Our results are of direct interest for researchers working on these traits. In addition, our studies serve to inform and direct ongoing efforts to map genes controlling other traits of interest, as well as in ongoing plant breeding efforts for corn and soybean.
Technical Abstract: Association mapping (AM) is a powerful tool for fine mapping complex trait variation down to nucleotide sequences by exploiting historical recombination events. A major problem in AM is controlling false positives that can arise from population structure and family relatedness. False positives are often controlled by incorporating covariates for structure and kinship in mixed linear models (MLM). These MLM-based methods are single locus models and can introduce false negatives due to over fitting of the model. In this study, eight different statistical models, ranging from single-locus to multilocus, were compared for AM for three traits differing in heritability in two crop species: soybean (Glycine max L.) and maize (Zea mays L.). Soybean and maize were chosen, in part, due to their highly differentiated rate of linkage disequilibrium (LD) decay, which can influence false positive and false negative rates. The fixed and random model circulating probability unification (FarmCPU) performed better than other models based on an analysis of Q-Q plots and on the identification of the known number of quantitative trait loci (QTLs) in a simulated data set. These results indicate that the FarmCPU controls both false positives and false negatives. Six qualitative traits in soybean with known published genomic positions were also used to compare these models, and results indicated that the FarmCPU consistently identified a single highly significant SNP closest to these known published genes. Multiple comparison adjustments (Bonferroni, false discovery rate, and positive false discovery rate) were compared for these models using a simulated trait having 60% heritability and 20 QTLs. Multiple comparison adjustments were overly conservative for MLM, CMLM, ECMLM, and MLMM and did not find any significant markers; in contrast, ANOVA, GLM, and SUPER models found an excessive number of markers, far more than 20 QTLs. The FarmCPU model, using less conservative methods (false discovery rate, and positive false discovery rate) identified 10 QTLs, which was closer to the simulated number of QTLs than the number found by other models.