Submitted to: Gigascience
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/26/2018
Publication Date: 3/4/2018
Citation: Geib, S.M., Hall, B., Derego, T., Sim, S.B. 2018. Genome annotation generator: a simple tool for generating and correcting WGS annotation tables for NCBI submission. Gigascience. 7(4):1-5. https://doi.org/10.1093/gigascience/giy018.
Interpretive Summary: Making genomic data publically available has become a very necessary and overwhelming task at a time when genomic data is being generated at a breakneck pace. We have provided a simple computer program that aids in the process of making genomic data ready to submit to the NCBI Submission Portal. This program is easy to install and requires few dependencies. Its primary utility is in transforming a draft genome assembly and gene annotation set into an NCBI annotation table (.tbl) format, but it includes other features such as filtering and transferring annotations. This program only requires very basic skills in using the command line and will enable genomic researchers to make their data publically available on NCBI.
Technical Abstract: One of the most overlooked, yet critical components of a whole genome sequencing project is the submission and curation of the data to a genomic repository, most commonly NCBI. While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into NCBI's annotation table format, these tools typically require back-end setup and connection to an SQL database and/or some knowledge of programming (Perl, Python) to implement. With whole genome sequencing becoming commonplace, genome sequencing projects are moving away from the genome centers, and into the ecology or biology lab, where much less resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, transfer annotations, and convert a draft genome assembly and annotation set into NCBI annotation table (.tbl) format, facilitating submission to NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command line to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline. The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to NCBI. It is useful for any individual researcher or research group who wishes to submit a genome assembly of their study system to NCBI.