|CHEN, J - University Of Missouri|
|GREER, S - University Of Missouri|
|Sudduth, Kenneth - Ken|
Submitted to: Transactions of the ASABE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/21/2018
Publication Date: 8/26/2018
Citation: Sadler, E.J., Nash, P.R., Drummond, S.T., Chen, J., Greer, S.T., Sudduth, K.A. 2018. Generalizable, extensible methods to implement QA/QC Tests on environmental data. Transactions of the ASABE. 61(4):1193-1198. doi:10.13031/trans.12641.
Interpretive Summary: Environmental data is needed to inform producer operational and strategic choices, to enable simulation modeling, and to inform public policy at all levels of government. Thus it is critical that the quality of the data be good. A critical component of quality assurance and quality control (QA/QC) is the screening of data for missing data, outliers, and changes that might be too abrupt or too slow to be real. Most QC screens are programmed using if-then-else statements to arrive at conclusions about the data. For many situations, we need more tests than are easily programmed that way. To avoid such limits, two alternative methods were developed: one usable for up to about 10 tests, and one usable for many more tests. Both use truth tables, borrowed from integrated circuit design in electrical engineering, to express and implement True/False logic. The usual and our two alternative methods are illustrated with a simple example of a measurement of air temperature from a local weather station. Both methods were shown to be flexible and could be used for a number of different cases without modification. The flexibility of these methods should increase power of QA/QC programs for environmental or other time series data, helping to ensure high-quality data for producers, scientists, and policy makers.
Technical Abstract: As environmental data is increasingly recognized as important, attention is focused on the quality of the data. A critical component of quality assurance and quality control (QA/QC) is the screening of data for missing data, outliers, and changes that might be too abrupt or too slow to be real. These tests can be run on stations with one or more sensors, and most implementations involve coding combinational logic of tests in if-then-else constructs to arrive at outcomes or actions depending on the inputs. For very simple sets of tests, usually limited to a single sensor, coding is not difficult, although complexity rises rapidly with an increased number of tests. For very common weather stations, redundant sensors or inter-sensor comparisons require enough tests to make if-then-else code very cumbersome. To ease such constraints and also improve ease of code modification and documentation, two alternative methods were developed without if-then-else logic: one scalable to approximately 10 tests, and one scalable to many more tests. The objectives of this paper are to describe the methods and to demonstrate their performance using realistic tests with existing data. Both use truth tables to express and implement logic – the first expands a truth table with n tests to all 2 raised to the nth power possible combinations and uses that as a lookup table, and the second matches test patterns in the truth table using its original unexpanded form. The usual and our two alternative methods are illustrated with a simple example of a measurement of air temperature, Ta. Relative advantages and disadvantages for each are discussed. Scalability of the methods is illustrated using measurements from a local weather station. Both were shown to be scalable, but with differences in overhead and first-run time required. If provided both the test results and either the expanded truth table (for the lookup method) or the original truth table (the pattern-match method), the core code was general. The flexibility of these methods should increase power of QA/QC programs for environmental or other time series data.