Publication : USDA ARS

ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #368925

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: A sorghum practical haplotype graph facilitates genome-wide imputation and cost effective genomic prediction

Author

	JENSEN, SARAH - Cornell University
	CHARLES, JEAN RIGAUD - Quisqueya University
	MULETA, KEBEDE - Kansas State University
	Bradbury, Peter
	CASSTEVENS, TERRY - Cornell University
	DESHPANDE, SANTOSH - International Crops Research Institute For Semi-Arid Tropics (ICRISAT) - India
	GORE, MICHAEL - Cornell University
	GUPTA, RAJEEV - International Crops Research Institute For Semi-Arid Tropics (ICRISAT) - India
	JOHNSON, LYNN - Cornell University
	LOZANO, ROBERTO - Cornell University
	MILLER, ZACHARY - Cornell University
	RAMU, PUNNA - Cornell University
	RATHORE, ABHISHEK - International Crops Research Institute For Semi-Arid Tropics (ICRISAT) - India
	UPADHYAYA, HARI - International Crops Research Institute For Semi-Arid Tropics (ICRISAT) - India
	VARSHNEY, RAJEEV - International Crops Research Institute For Semi-Arid Tropics (ICRISAT) - India
	MORRIS, GEOFFREY - Kansas State University
	PRESSOIR, GAEL - Quisqueya University
	Buckler, Edward - Ed
	RAMSTEIN, GUILLAUME - Cornell University

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/4/2020
Publication Date: 5/12/2020
Citation: Jensen, S., Charles, J., Muleta, K., Bradbury, P., Casstevens, T., Deshpande, S.P., Gore, M.A., Gupta, R., Johnson, L., Lozano, R., Miller, Z., Ramu, P., Rathore, A., Upadhyaya, H.D., Varshney, R., Morris, G.P., Pressoir, G., Buckler IV, E.S., Ramstein, G. 2020. A sorghum practical haplotype graph facilitates genome-wide imputation and cost effective genomic prediction. The Plant Genome. 13(1). Article e20009. https://doi.org/10.1002/tpg2.20009.
DOI: https://doi.org/10.1002/tpg2.20009

Interpretive Summary: Managing and utilizing genomic datasets is an essential part of plant breeding programs. As these datasets become increasingly large, however, breeders need tools to merge and store new and existing genome sequence data. Database tools can help manage large datasets and reduce the amount of new data that plant breeders need to produce for their breeding programs. For this study, we developed a genomic database for sorghum using a breeding tool called the Practical Haplotype Graph (PHG). We tested whether the sorghum PHG can merge sequence data from many individuals without losing important information about differences between plants. We also compared the PHG to existing tools to determine how well the PHG can predict genotypes for new individuals and found that the PHG performs better and requires less new input data than state-of-the-art methods. We found that the sorghum PHG could accurately predict genotypes from sequencing data covering only 1% of the genome, which can be produced for less than $10 per individual. Our results show that PHG is a useful research and breeding tool because it maintains information from a diverse group of individuals, stores genome sequence data in an accessible format, unifies genotypes derived from different genotyping methods, and provides a cost-effective option for genomic selection for any species.

Technical Abstract: Successful management and utilization of increasingly large genomic datasets is essential for breeding programs to increase genetic gain and accelerate cultivar development. To help with data management and storage, we developed a sorghum Practical Haplotype Graph (PHG) pangenome database that stores all identified haplotypes and variant information for a given set of individuals. We developed two PHGs in sorghum, one with 24 individuals and another with 374 individuals, that reflect the diversity across genic regions of the sorghum genome. 24 founders of the Regional Biofuels Technical & Knowledge Center (CHIBAS) sorghum breeding program were sequenced at low coverage (0.01x) and processed through the PHG to identify genome-wide variants. The PHG called SNPs with only 5.9% error at 0.01x coverage - only 3% lower than its accuracy when calling SNPs from 8x coverage sequence. Additionally, 207 progeny from the CHIBAS genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes in the progeny were imputed from the parental haplotypes available in the PHG and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from 0.57-0.73 for different traits, and are similar to prediction accuracies obtained with genotyping-by-sequencing (GBS) or markers from sequencing targeted amplicons (rhAmpSeq). This study provides a proof of concept for using a sorghum PHG to call and impute SNPs from low-coverage sequence data and also shows that the PHG can unify genotype calls from different sequencing platforms. By reducing the amount of input sequence needed, the PHG has the potential to decrease the cost of genotyping for genomic selection, making GS more feasible and facilitating larger breeding populations that can capture maximum recombination. Our results demonstrate that the PHG is a useful research and breeding tool that can maintain variant information from a diverse group of taxa, store sequence data in a condensed but readily accessible format, unify genotypes from different genotyping methods, and provide a cost-effective option for genomic selection for any species.

U.S. DEPARTMENT OF AGRICULTURE

Plant, Soil and Nutrition Research: Ithaca, NY