Submitted to: AgriEngineering
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/19/2020
Publication Date: 5/22/2020
Citation: Pelletier, M.G., Holt, G.A., Wanjura, J.D. 2020. Plastic contamination image dataset for deep learning model development and training. AgriEngineering. 2(2):317-321. https://doi.org/10.3390/agriengineering2020021.
Interpretive Summary: Plastic contamination is significantly damaging the reputation of U.S. cotton in the international market. To protect the U.S. reputation, removal of plastic contamination in cotton is now a top priority to the U.S. cotton industry. This report covers the development of an image data-set that can be used to further improve modern detection systems that are needed to drive and control plastic contamination removal machinery. The image data-set is included along with this report as an Open-Source resource to help further advance the science and lower the barriers of development.
Technical Abstract: The removal of plastic contamination in cotton lint is an issue of top priority to the U.S. cotton industry. One of the main sources of plastic contamination showing up in marketable cotton bales, is plastic used to wrap cotton modules on cotton harvesters. To help mitigate plastic contamination at the gin; automatic inspection systems are needed to detect and control removal systems. Due to significant cost constraints in the US cotton ginning industry; the use of low-cost color cameras for detection of plastic contamination has been successfully adopted. However, some plastics of similar color to background, are difficult to detect when utilizing traditional machine learning algorithms. Hence, the current designs are not able to remove all of plastics and there is still a need for better detection methods. Recent advances in Deep-Learning Convolutional Neural Networks, CNN, show promise for enabling the use of low-cost color cameras for detection of objects of interest when placed against a background of similar color. They do this by mimicking the human-visual detection system that focus on differences in texture, rather than color as the primary detection paradigm. The key to leveraging the CNN's is the development of extensive image data-sets, that are required to train them. One of the impediments to this methodology is the need for large image data-sets where each image must be annotated with bounding boxes that surround each object of interest. As this requirement is labor intensive; there is significant value in these image data-sets. This report details the included image data-set as well as the system design used to collect the images. For acquisition, of the image data-set, a prototype detection system was developed and then deployed into a commercial cotton gin where images were collected for the duration of the 2018-2019 ginning season. A discussion of the observational impact the system had on reduction of plastic contamination at the commercial gin, utilizing traditional color-based machine learning algorithms, is also included.