Skip to main content
ARS Home » Plains Area » Manhattan, Kansas » Center for Grain and Animal Health Research » Stored Product Insect and Engineering Research » Research » Publications at this Location » Publication #427340

Research Project: Next-Generation Approaches for Monitoring and Management of Stored Product Insects

Location: Stored Product Insect and Engineering Research

Title: Numerical Signature Dataset of Curculionidae and TenebrionidaeBeetle Fragments for ML Identification

Author
item SERFA JUAN, RONNIE - Oak Ridge Institute For Science And Education (ORISE)
item Gerken, Alison

Submitted to: Scientific Data
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/11/2025
Publication Date: 12/12/2025
Citation: Serfa Juan, R.O., Gerken, A.R. 2025. Numerical Signature Dataset of Curculionidae and TenebrionidaeBeetle Fragments for ML Identification. Scientific Data. https://doi.org/10.1038/s41597-025-06309-6.
DOI: https://doi.org/10.1038/s41597-025-06309-6

Interpretive Summary: Over 50 different species of stored product insect pests infest and damage raw grains and human and animal food products, causing significant losses. Proper identification of stored product insect species can aid in pinpointing the source of infestation and implementing pest control tactics to limit damage, but this is time-consuming and relies on taxonomic expertise, which is not always readily available. Moreover, insect bodies are often not completely intact and only body parts are found, making it difficult for even trained experts to identify species. To overcome these challenges, we extracted image features from six different species of stored product insects that can be used by artificial intelligence to taxonomically classify them from insect fragments, focusing on diagnostic anatomical structures such as antennae, elytra, thoraxes, snouts, and head aspect ratios. These image features included estimates of evenness, texture, edges, reflection of surfaces, and color contrast of the fragments, which are being released publicly as a dataset to train AI models. Beyond accurate species identification, these data can also be used to conclusively identify invasive stored product insects, which can help prevent their establishment into areas where they are not yet introduced.

Technical Abstract: This data descriptor presents a curated dataset of numerical signature descriptors derived from fragment images of six economically significant stored-product beetle species from the families Curculionidae (Sitophilus zeamais, Sitophilus oryzae, Sitophilus granarius) and Tenebrionidae (Tribolium castaneum, Tribolium confusum, Latheticus oryzae). Anatomical fragments—including antennae, elytra, thorax, snout (Curculionidae), and head aspect ratio (Tenebrionidae)—were imaged using digital microscopy and processed with standardized image acquisition and segmentation techniques. From each image, four statistical descriptors—skewness, kurtosis, entropy, and standard deviation—were extracted, forming compact numerical signatures that capture fragment-level texture and morphological variation. These descriptors are designed to support artificial intelligence and machine learning workflows for automated classification in entomological diagnostics and post-harvest pest detection. The dataset includes 3,423 fragment images, each linked to a numerical signature vector and labeled by species, anatomical region, and metadata. It is made publicly available via the USDA-ARS Ag Data Commons link 10.15482/USDA.ADC/29066444 under a CC-BY license, promoting reuse, benchmarking, and collaborative research in entomology, computer vision, and precision agriculture.