Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #394111

Research Project: Gene Discovery and Designing Soybeans for Food, Feed, and Industrial Applications

Location: Plant Genetics Research

Title: The allele catalog tool: a web-based interactive tool for allele discovery and analysis

item CHAN, YEN ON - University Of Missouri
item DIETZ, NICHOLAS - University Of Missouri
item ZENG, SHUAI - University Of Missouri
item WANG, JUEXIN - University Of Missouri
item Flint-Garcia, Sherry
item SALAZAR-VIDAL, NANCY - University Of California, Davis
item SKRABISOVA, MARIA - Palacky University
item Bilyeu, Kristin
item JOSHI, TRUPTI - University Of Missouri System

Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/31/2023
Publication Date: 3/10/2023
Citation: Chan, Y., Dietz, N., Zeng, S., Wang, J., Flint Garcia, S.A., Salazar-Vidal, N.M., Skrabisova, M., Bilyeu, K.D., Joshi, T. 2023. The allele catalog tool: a web-based interactive tool for allele discovery and analysis. BMC Genomics. 24: Article 107.

Interpretive Summary: Generating genomic sequence data for large numbers of accessions is now feasable for many agriculturally relevant species. However, the analysis of those big data sets has been descriptive, noncomprehensive, and static. The objective of this research was to design and develop a gene and accession-based interactive bioinformatics tool utilizing genomic sequence data from large accession sets. The Allele Catalog Tool is an online resource for soybean, maize, and the model plant Arabidopsis research that empowers users to explore the data in a gene-based standardized format. The results are rendered with summary accession information along with details of the gene information. Detailed meta information is also available for all accessions. The results are downloadable for additional analysis offline. The impact of this work is the ability to conduct biological investigations on previously generated data and therefore connect genotypes to phenotypes for an initial set of agriculturally important species.

Technical Abstract: Background The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. Results The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. Conclusions The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website (, while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( and Researchers can use this tool to connect variant alleles of genes with meta-information of species.