Location: Plant, Soil and Nutrition ResearchTitle: A maize practical haplotype graph leverages diverse NAM assemblies
|VALDES FRANCO, JOSE - Cornell University|
|GAGE, JOSEPH - Cornell University|
|JOHNSON, LYNN - Cornell University|
|MILLER, ZACHARY - Cornell University|
|Buckler, Edward - Ed|
|ROMAY, M. CINTA - Cornell University|
Submitted to: bioRxiv
Publication Type: Other
Publication Acceptance Date: 8/31/2020
Publication Date: 8/31/2020
Citation: Valdes Franco, J.A., Gage, J.L., Bradbury, P., Johnson, L.C., Miller, Z.R., Buckler IV, E.S., Romay, M. 2020. A maize practical haplotype graph leverages diverse NAM assemblies. bioRxiv. https://doi.org/10.1101/2020.08.31.268425.
Interpretive Summary: Maize is a highly diverse species with a complex genome. To better understand it’s genetic components we need to leverage this diversity in an integrated way. We developed a genomic (haplotype) database that leverages the genomic information of over 25 maize inbred lines. Through a pipeline (the PHG), using inexpensive and shallow sequencing data, we can now accurately identify and store the genotypic components of thousands of maize samples, in a very efficient and shareable form. This (haplotype) database and (imputation) pipeline will allow us to more accurately, and inexpensively, genotype and discover the specific genetic components that regulate many maize characteristics and its ability to grow in very distinct environmental conditions.
Technical Abstract: As a result of millions of years of transposon activity, multiple rounds of ancient polyploidization, and large populations that preserve diversity, maize has an extremely structurally diverse genome, evidenced by high-quality genome assemblies that capture substantial levels of both tropical and temperate diversity. We generated a pangenome representation (the Practical Haplotype Graph, PHG) of these assemblies in a database, representing the pangenome haplotype diversity and providing an initial estimate of structural diversity. We leveraged the pangenome to accurately impute haplotypes and genotypes of taxa using various kinds of sequence data, ranging from WGS to extremely-low coverage GBS. We imputed the genotypes of the recombinant inbred lines of the NAM population with over 99% mean accuracy, while unrelated germplasm attained a mean imputation accuracy of 92 or 95% when using GBS or WGS data, respectively. Most of the imputation errors occur in haplotypes within European or tropical germplasm, which have yet to be represented in the maize PHG database. Also, the PHG stores the imputation data in a 30,000-fold more space-efficient manner than a standard genotype file, which is a key improvement when dealing with large scale data.