Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #401976

Research Project: MaizeGDB: Enabling Access to Basic, Translational, and Applied Research Information

Location: Corn Insects and Crop Genetics Research

Title: Maize Feature Store (MFS): A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications

Author
item SEN, SHATABDI - Iowa State University
item Woodhouse, Margaret
item Portwood, John
item Andorf, Carson

Submitted to: Maize Annual Meetings
Publication Type: Abstract Only
Publication Acceptance Date: 2/10/2023
Publication Date: 3/16/2023
Citation: Sen, S., Woodhouse, M.H., Portwood II, J.L., Andorf, C.M. 2023. Maize Feature Store (MFS): A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications. Maize Annual Meetings. 69.

Interpretive Summary: N/A

Technical Abstract: The big-data analysis of complex data associated with maize genomes accelerates genetic research and improves agronomic traits. As a result, efforts have increased to integrate diverse datasets and extract meaning from these measurements. Machine learning models are a powerful tool for gaining knowledge from large and complex datasets. However, these models must be trained on high-quality features to succeed. Currently, there are no solutions to host maize multi-omics datasets with end-to-end solutions for evaluating and linking features to target gene annotations. Our work presents the Maize Feature Store (MFS), a versatile application that combines features built on complex data to facilitate exploration, modeling, and analysis. Feature stores allow researchers to rapidly deploy machine learning applications by managing and providing access to frequently used features. We populated the MFS for the maize reference genome with over 14,000 gene-based features based on published genomic, transcriptomic, epigenomic, variomic, and proteomics data sets. Using the MFS, we created an accurate pan-genome classification model with an AUC-ROC score of 0.85. The MFS is publicly available through the maize genetics and genomics database.