Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #389415

Research Project: Genetic and Physiological Mechanisms Underlying Complex Agronomic Traits in Grain Crops

Location: Plant Genetics Research

Title: Trait association and prediction through integrative K-mer analysis

Author
item HE, CHENG - Kansas State University
item Washburn, Jacob
item HAO, YANGFAN - Kansas State University
item ZHANG, ZHIWU - Washington State University
item YANG, JINLIANG - University Of Nebraska
item LIU, SANZHEN - Kansas State University

Submitted to: bioRxiv
Publication Type: Pre-print Publication
Publication Acceptance Date: 11/19/2021
Publication Date: 11/19/2021
Citation: He, C., Washburn, J.D., Hao, Y., Zhang, Z., Yang, J., Liu, S. 2021. Trait association and prediction through integrative K-mer analysis. bioRxiv. https://doi.org/10.1101/2021.11.17.468725.
DOI: https://doi.org/10.1101/2021.11.17.468725

Interpretive Summary: Genome-wide association study (GWAS) and genomic prediction (GP) are popular and effective methods for determining which genes potentially contribute to a trait, and for predicting how different individuals manifest that trait. Both methods traditionally require the mapping of DNA sequences to a reference sequenced genome. This mapping process is error prone and depends on the quality and existence of a reference genome. An alternative approach was developed and tested for using k-mers, short k-length fragments from DNA sequences, directly without a mapping step. This approach was shown to work in ways that are complimentary to traditional methods, and in some cases more accurate than those methods.

Technical Abstract: Genome-wide association study with single nucleotide polymorphisms (SNPs) has been widely used to explore genetic controls of phenotypic traits. Here we employed an approach based on k-mers, short substrings from sequencing reads. Using maize cob and kernel color traits, we demonstrated that k-mer GWAS can identify associated k-mers from known loci. Co-expression analysis of kernel color associated k-mers and pathway genes directly found k-mers from causal genes. Analyzing complex traits of kernel oil and leaf angle resulted in associated k-mers from known and candidate genes. Evolution analysis revealed most k-mers positively correlated with kernel oil were under purifying selection in maize populations, while most k-mers for upright leaf angle were positively selected. In addition, phenotypic prediction of flowering time using k-mer data showed a similar prediction accuracy to the SNP method. Collectively, our results demonstrated that the k-mer can be a bridging element for data integration and functional gene discovery.