Bock, C.h. | |
Gottwald, Timothy | |
Parker, P.e. | |
Ferrandino, F. | |
Welham, S. |
Submitted to: Phytopathology
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 8/14/2009 Publication Date: 10/1/2010 Citation: Bock, C., Gottwald, T.R., Parker, P., Ferrandino, F., Welham, S. 2010. Some consequences of using the Horsfall-Barratt scale for hypothesis testing. Phytopathology. 100:1030-1041. Interpretive Summary: Disease severity is assessed in various ways, including using disease scales that can have various structures. Whether disease is measured using a scale or by a direct estimate of percent leaf area infected the data can be used to compare treatments statistically against a null hypothesis. In this study nearest percent estimates (NPEs) of disease severity were compared to an oft-used scale, the Horsfall-Barratt (H-B) scale, to explore whether there was an effect of assessment method on hypothesis testing. Simulation modeling was used to compare the two approaches. showed that the standard deviations of the H-B scale data deviated from that of visual raters, particularly in the range 20 to 50% severity, over which H-B scale grade intervals are widest. In comparing treatments, NPE data had a higher probability to reject the null hypothesis (H0) when H0 was false, although greater sample size increased the probability to reject H0 for both methods. The H-B scale required up to a 50% greater sample size to attain the same probability to reject the null-hypothesis as NPEs when H0 was false. This suggests an increase in sample size can resolve the variability caused by inaccurate estimates due to H-B scale, and perhaps other scales. As expected, various populations characteristics influenced the probability to reject H0 including the difference between the two severity distribution means, their variability and the ability of the raters. Inaccurate raters showed a similar probability to reject H0 when H0 was false using either assessment method, but the ability of average and accurate raters to assess disease was imparied by using the scale. Accurate raters had on average better resolving power for estimating disease compared to that offered by the H-B scale. There are situations where using a disease scale results in relatively imprecise data that can detract from the analysis and lead to incorrect conclusions. Technical Abstract: Comparing treatment effects by hypothesis testing is a common practice in plant pathology. Nearest percent estimates (NPEs) of disease severity were compared to Horsfall-Barratt (H-B) scale data to explore whether there was an effect of assessment method on hypothesis testing. A simulation model based on field collected data using leaves with disease severity from 0 to 60% was used: the relationship between NPEs and true severity was linear; a hyperbolic function described the relationship between the standard deviation of the rater mean NPE and true disease; and a lognormal distribution was assumed to describe the frequency of NPEs of specific true disease severities by raters. Results of the simulation showed standard deviations of mean NPEs were consistently similar to the original rater standard deviation from the field collected data, however, the standard deviations of the H-B scale data deviated from that of the original rater standard deviation, particularly in the range 20 to 50% severity, over which H-B scale grade intervals are widest, and it is thus over this range that differences in hypothesis testing are most likely to occur. To explore this, two normally distributed, hypothetical severity populations were compared using a t-test with NPEs and H-B midpoint data. NPE data had a higher probability to reject H0 when H0 was false, but greater sample size increased the probability to reject H0 for both methods, with the H-B scale data requiring up to a 50% greater sample size to attain the same probability to reject the null-hypothesis as NPEs when H0 was false. This suggests an increase in sample size can resolve the variability caused by inaccurate estimates due to H-B scale midpoint conversions. As expected, various populations characteristics influenced the probability to reject H0 including the difference between the two severity distribution means, their variability and the ability of the raters. Inaccurate raters showed a similar probability to reject H0 when H0 was false using either assessment method, but average and accurate raters had a greater probability to reject H0 when H0 was false using NPEs compared to H-B scale data. Accurate raters had on average better resolving power for estimating disease compared to that offered by the H-B scale and so the resulting sample variability was more representative of the population when sample size was limiting. There are various circumstances under which H-B scale data has a greater risk of failing to reject H0 when H0 is false (a Type II error) compared to NPEs. |