Bock, C.H.   
Gottwald, Timothy  
Parker, P.E.   
Ferrandino, F.   
Welham, S.  
Submitted to: Phytopathology
Publication Type: Peer Reviewed Journal Publication Acceptance Date: August 14, 2009 Publication Date: October 1, 2010 Citation: Bock, C., Gottwald, T.R., Parker, P., Ferrandino, F., Welham, S. 2010. Some consequences of using the HorsfallBarratt scale for hypothesis testing. Phytopathology. 100:10301041. Interpretive Summary: Disease severity is assessed in various ways, including using disease scales that can have various structures. Whether disease is measured using a scale or by a direct estimate of percent leaf area infected the data can be used to compare treatments statistically against a null hypothesis. In this study nearest percent estimates (NPEs) of disease severity were compared to an oftused scale, the HorsfallBarratt (HB) scale, to explore whether there was an effect of assessment method on hypothesis testing. Simulation modeling was used to compare the two approaches. showed that the standard deviations of the HB scale data deviated from that of visual raters, particularly in the range 20 to 50% severity, over which HB scale grade intervals are widest. In comparing treatments, NPE data had a higher probability to reject the null hypothesis (H0) when H0 was false, although greater sample size increased the probability to reject H0 for both methods. The HB scale required up to a 50% greater sample size to attain the same probability to reject the nullhypothesis as NPEs when H0 was false. This suggests an increase in sample size can resolve the variability caused by inaccurate estimates due to HB scale, and perhaps other scales. As expected, various populations characteristics influenced the probability to reject H0 including the difference between the two severity distribution means, their variability and the ability of the raters. Inaccurate raters showed a similar probability to reject H0 when H0 was false using either assessment method, but the ability of average and accurate raters to assess disease was imparied by using the scale. Accurate raters had on average better resolving power for estimating disease compared to that offered by the HB scale. There are situations where using a disease scale results in relatively imprecise data that can detract from the analysis and lead to incorrect conclusions. Technical Abstract: Comparing treatment effects by hypothesis testing is a common practice in plant pathology. Nearest percent estimates (NPEs) of disease severity were compared to HorsfallBarratt (HB) scale data to explore whether there was an effect of assessment method on hypothesis testing. A simulation model based on field collected data using leaves with disease severity from 0 to 60% was used: the relationship between NPEs and true severity was linear; a hyperbolic function described the relationship between the standard deviation of the rater mean NPE and true disease; and a lognormal distribution was assumed to describe the frequency of NPEs of specific true disease severities by raters. Results of the simulation showed standard deviations of mean NPEs were consistently similar to the original rater standard deviation from the field collected data, however, the standard deviations of the HB scale data deviated from that of the original rater standard deviation, particularly in the range 20 to 50% severity, over which HB scale grade intervals are widest, and it is thus over this range that differences in hypothesis testing are most likely to occur. To explore this, two normally distributed, hypothetical severity populations were compared using a ttest with NPEs and HB midpoint data. NPE data had a higher probability to reject H0 when H0 was false, but greater sample size increased the probability to reject H0 for both methods, with the HB scale data requiring up to a 50% greater sample size to attain the same probability to reject the nullhypothesis as NPEs when H0 was false. This suggests an increase in sample size can resolve the variability caused by inaccurate estimates due to HB scale midpoint conversions. As expected, various populations characteristics influenced the probability to reject H0 including the difference between the two severity distribution means, their variability and the ability of the raters. Inaccurate raters showed a similar probability to reject H0 when H0 was false using either assessment method, but average and accurate raters had a greater probability to reject H0 when H0 was false using NPEs compared to HB scale data. Accurate raters had on average better resolving power for estimating disease compared to that offered by the HB scale and so the resulting sample variability was more representative of the population when sample size was limiting. There are various circumstances under which HB scale data has a greater risk of failing to reject H0 when H0 is false (a Type II error) compared to NPEs.
