Location: Dairy Forage ResearchTitle: Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass
|RAMSTEIN, GUILLAUME - UNIVERSITY OF WISCONSIN|
|LU, FEI - CORNELL UNIVERSITY - NEW YORK|
|LIPKA, ALEXANDER - UNIVERSITY OF ILLINOIS|
|COSTICH, DENISE - INTERNATIONAL MAIZE & WHEAT IMPROVEMENT CENTER (CIMMYT)|
|CHERNEY, JEROME - CORNELL UNIVERSITY - NEW YORK|
|Buckler, Edward - Ed|
Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/10/2015
Publication Date: 5/1/2015
Publication URL: http://handle.nal.usda.gov/10113/62401
Citation: Ramstein, G., Lu, F., Lipka, A., Costich, D., Cherney, J., Casler, M.D., Buckler IV, E.S. 2015. Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass. G3, Genes/Genomes/Genetics. 5(5):891-909.
Interpretive Summary: Reed canarygrass is a potentially important biofuel species, but very little is known about the genetic control of important biofuel traits. In this study, modern genotyping methods were combined with modern statistical methodology to identify nine chromosomal regions of reed canarygrass associated with biofuel traits, such as plant height, plant vigor, and biomass quality. This study provides one of the first applications of multiple imputation methods to identify significant associations between genetic markers and biomass traits and will be of value to many other scientists working on marker-trait associations in other minor species that do not possess a sequenced reference genome.
Technical Abstract: Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species that has great potential as a biofuel crop. Our study involved two linkage populations and a diverse panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single-nucleotide-polymorphism markers and 39 field and quality traits. The genotypic markers were derived from low-depth sequencing and, as a result, were characterized by a very high amount of missing data (83% on average). To soundly infer marker-trait associations, an approach known as multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty, and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no genetic/genomic map was available, imputes were generated through non-parametric models: classification trees or random forests. Bias due to imputation was assessed by comparing the results from MI to those obtained from non-missing data (complete cases) only or from ignoring imputation uncertainty. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. MI permitted a gain in significance of marker effects, but only for rare cases when the amount of missing data was moderate (no more than about 45%). In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data.