Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #399901

Research Project: Gene Discovery and Designing Soybeans for Food, Feed, and Industrial Applications

Location: Plant Genetics Research

Title: AccuCalc: A python package for accuracy calculation in GWAS

item BIOVÁ, JANA - Palacky University
item DIETZ, NICHOLAS - University Of Missouri
item CHAN, YEN ON - University Of Missouri
item JOSHI, TRUPTI - University Of Missouri
item Bilyeu, Kristin
item ŠKRABIŠOVÁ, MÁRIA - Palacky University

Submitted to: Genes
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/27/2022
Publication Date: 1/1/2023
Citation: Biová, J., Dietz, N., Chan, Y., Joshi, T., Bilyeu, K.D., Škrabišová, M. 2023. AccuCalc: A python package for accuracy calculation in GWAS. Genes. 14(1). Article 123.

Interpretive Summary: Connecting genes to phenotypes is key area of agricultural research across crop and livestock species. The methodology to statistically associate regions of the genome with traits has expanded with new and more affordable genotyping capabilities including whole genome resequence data generation. However, actually identifying the causative mutations in the candidate genes has remained challenging despite improved Big Data resources. The objective of this research was to build a bioinformatics tool that calculates the Accuracy component (direct correspondence) of a post-association analysis strategy without regard to input species. The tool saves analyses in a user-friendly format and offers visualization of the results with the input data accentuated by Accuracy. The impact of this work is new resources downstream of the association analysis step to aid in connecting genes to phenotypes.

Technical Abstract: The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered “GWAS to Genes” strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.