Publication : USDA ARS

ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #284445

Title: An overview of the biocreative 2012 workshop track III: Interactive text mining task

Author

	ARIGHI, CELCILIA - University Of Delaware
	CARTERETTE, BEN - University Of Delaware
	KRALLINGER, MARTIN - Spanish National Cancer Research Centre
	WILBUR, JOHN - National Institutes Of Health (NIH)
	FEY, PETRA - Northwestern University
	DODSON, ROBERT - Northwestern University
	COOPER, LAUREL - Oregon State University
	VAN SLYKE, CERI - University Of Oregon
	DAHDUL, WASILA - University Of South Dakota
	MABEE, PAULA - University Of South Dakota
	Schaeffer, Mary

Submitted to: Database: The Journal of Biological Databases and Curation
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/28/2012
Publication Date: 1/17/2013
Citation: Arighi, C.N., Carterette, B., Krallinger, M., Wilbur, J.W., Fey, P., Dodson, R., Cooper, L., Van Slyke, C.E., Dahdul, W., Mabee, P., Schaeffer, M.L., et al 2013. An overview of the biocreative 2012 workshop track III: Interactive text mining task. Database: The Journal of Biological Databases and Curation. 2012:1-18. Available: http://database.oxfordjournals.org/content/2013/bas056.full.

Interpretive Summary: Manual curation is the major bottleneck in adding experimentally proven gene functions to the online genome databases that serve as key resources in deciphering the ever-increasing numbers of genomic blueprints for crops and farm animals. This report describes a collaborative effort to reduce the bottleneck. A number of text mining tool developers and database curators have met to test performance of new tools, and discuss improvements that would speed up the transfer of data from the literature to any genome database. This collaboration involves both persons knowledgeable in both medical and agriculturally relevant species. The goal is to facilitate tools that will use these data to infer and/or confirm the genes responsible for complex traits important to both medicine and agriculture.

Technical Abstract: An important question is how to make use of text mining to enhance the biocuration workflow. A number of groups have developed tools for text mining from a computer science/linguistics perspective and there are many initiatives to curate some aspect of biology from the literature. In some cases the curation effort already makes use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here we report on an effort to bring together a number of text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in the formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that a set of systems were able to improve efficiency of curation by speeding the curation task significantly (~1.7 to 2.5 fold) over manual curation. Some of the systems were able to improve annotation accuracy when compared to the performance on the manually curated set. In terms of inter-annotator agreement the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation, and not following annotation guidelines. The user survey analysis highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability, and usability. We will use this information to plan for a more intense study of these issues in the coming year at the BioCreative IV Workshop.

U.S. DEPARTMENT OF AGRICULTURE

Plant Genetics Research: Columbia, MO