Submitted to: Agronomy
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/15/2020
Publication Date: 4/22/2020
Citation: Reeves, P.A., Tetreault, H.M., Richards, C.M. 2020. Accessing functional genetic diversity in germplasm collections using bioinformatics. Agronomy. 10(4). Article e593. https://doi.org/10.3390/agronomy10040593.
Interpretive Summary: Searching for useful variation in large gene bank collections can be challenging especially when accessions have not been characterized or evaluated phenotypically. The paper demonstrates a way users of plant genetic resources collections might search for useful traits using sequence information only. Our approach uses both sets for single nucleotide markers that segregate together as halotype blocks as the unit of variation. Pairing these polymorphisms with their genomic position allows us to identify variation in genes that have been annotated and are part of a hierarchical gene ontology. We have coupled these data with a maximization algorithm to provide users with a small focused subset of germplasm maximized for diversity at a set of functionally defined loci. Whole genome re-sequencing projects can support targeted access to novel genetic diversity present in plant germplasm collections, accelerating molecular breeding for improved traits.
Technical Abstract: Efficient utilization of genetic variation in plant germplasm collections is impeded by large collection size, uneven characterization of traits, and unpredictable apportionment of allelic diversity among accessions. Distributing compact subsets of the complete collection that contain maximum allelic diversity at functional loci of interest could streamline conventional and precision breeding. Using Arabidopsis, Populus and sorghum, we show that genomewide single nucleotide polymorphism data permits the capture of 3–78 fold more haplotypic diversity in subsets than geographic or environmental data, which are commonly used surrogate predictors of genetic diversity. Using landrace sorghum, we demonstrate three bioinformatic approaches to access functional genetic diversity. First, using a candidate gene approach, we assembled subsets that maximized haplotypic diversity at 135 putative lignin biosynthetic loci, potentially useful for biomass breeding programs. Secondly, we used keyword search against the Gene Ontology to identify 1040 regulatory loci and assembled subsets capturing genomewide regulatory gene diversity, a general source of phenotypic variation. Third, we used a machine learning approach to rank semantic similarity between Gene Ontology term definitions and the textual content of scientific publications on crop adaptation to climate, a complex breeding objective. We identified 505 sorghum loci whose documented function is semantically related to climate adaptation concepts. The assembled subsets could be used to address climatic pressures on sorghum production. To face impending agricultural challenges and foster rapid extraction and use of novel genetic diversity resident in germplasm collections, whole genome resequencing efforts should be prioritized.