Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #386572

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: Genome-wide imputation using the practical haplotype graph in the heterozygous crop cassava

item LONG, EVAN - Cornell University
item Bradbury, Peter
item ROMAY, MARIA CINTA - Cornell University
item Buckler, Edward - Ed
item ROBBINS, KELLY - Cornell University

Submitted to: bioRxiv
Publication Type: Other
Publication Acceptance Date: 5/13/2021
Publication Date: 5/13/2021
Citation: Long, E.M., Bradbury, P., Romay, M., Buckler IV, E.S., Robbins, K.R. 2021. Genome-wide imputation using the practical haplotype graph in the heterozygous crop cassava. bioRxiv.

Interpretive Summary: For outbred crops such as cassava, it is difficult to obtain accurate genetic information from the sparse genotyping generally performed in agronomic settings, especially without large financial resources. This genetic information is needed to efficiently implement methods such as genomic prediction or genome-wide association, which can help increase genetic gains in crop breeding. Accurate genome imputation can help produce this genetic information with limited resources, improving the breeders ability to leverage these genetic tools. We built and tested a genetic database, known as a Practical Haplotype Graph, for the root crop cassava. This contains unique segments of genetic data known as haplotypes from many different cassava lines that can be used to predict whole genome information for use in cassava breeding and research. While typically difficult to capture this information in complex outbred crops, we were able to leverage long segments of shared ancestry to populate the database. We showed the method can effectively be used for applications such as genotype imputation and the prediction of plant traits. This resource may be a valuable tool for cassava breeders, while the method described in our work can be reimplemented in other outbred crops. The implementation of the Practical Haplotype Graph in cassava will be a valuable tool for cassava breeders, as well as serving as a model for using the method in outbred plant and animal species. The end goal of improved imputation is to allow breeders and scientists to increase sampling quantity and quality, thereby increasing the power of their studies. This can mean more offspring evaluated, leading to larger gains in breeding objectives. Improving breeding gains is necessary to both improve food availability, as well as food security in a changing environment. Improved genotype information can also enable researchers to have greater statistical power in locating functional and causative elements of the genome.

Technical Abstract: Genomic applications such as genomic selection and genome-wide association have become increasingly common since the advent of genome sequencing. Genotype imputation makes it possible to infer whole genome information from limited input data, making large sampling for genomic applications more feasible, especially in non-model species where resources are less abundant. Imputation becomes increasingly difficult in heterozygous species where haplotypes must be phased. The Practical Haplotype Graph is a recently developed tool that can accurately impute genotypes, using a reference panel of haplotypes. The Practical Haplotype Graph is a haplotype database that implements a trellis graph to predict haplotypes using minimal input data. Genotyping information is aligned to the database and missing haplotypes are predicted from the most likely path through the graph. We showcase the ability of the Practical Haplotype Graph to impute genomic information in the highly heterozygous crop cassava (Manihot esculenta). Accurately phased haplotypes were sampled from runs of homozygosity across a diverse panel of individuals to populate the graph, which proved more accurate than relying on computational phasing methods. At 1X input sequence coverage, the Practical Haplotype Graph achieves a high concordance between predicted and true genotypes (R=0.84), as compared to the standard imputation tool Beagle (R=0.69). This improved accuracy was especially visible in the prediction of rare and heterozygous alleles. We validate the Practical Haplotype Graph as an accurate imputation tool in the heterozygous crop cassava, showing its potential for application in heterozygous species.