Location: Livestock and Range Research LaboratoryTitle: A Bayesian approach for analysis of ordered categorical responses subject to misclassification Author
|Ling, Ashley - UNIVERSITY OF GEORGIA|
|Aggrey, Samuel - UNIVERSITY OF GEORGIA|
|Rekaya, Romdhane - UNIVERSITY OF GEORGIA|
Submitted to: PLoS One
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/4/2018
Publication Date: 12/13/2018
Citation: Ling, A., Hay, E.A., Aggrey, S., Rekaya, R. 2018. A Bayesian approach for analysis of ordered categorical responses subject to misclassification. PLoS One. 13(12):e0208433. | https://doi.org/10.1371/journal.pone.0208433.
DOI: https://doi.org/10.1371/journal.pone.0208433 Interpretive Summary: Error in the measurement and recording of data, especially discrete data in animal agriculture is a prevalent issue that has the potential to affect the results of a statistical analysis through the reduction of the statistical power and estimation of biased inferences. Given the negative effects of misclassification on inference, a statistical model for correcting multinomial responses using a hierarchical Bayesian framework was developed and evaluated using real and simulated data. Using our proposed approach, a significant reduction in bias and increase in accuracy ranging from 11% to 17% was observed. Furthermore, the majority of the misclassified observations (in the simulated data) were identified with a substantially higher probability.
Technical Abstract: Ordinal categorical responses are frequently collected in survey studies, human medicine, and animal and plant improvement programs, just to mention a few. Errors in this type of data are neither rare nor easy to detect. These errors tend to bias the inference, reduce the statistical power and ultimately the efficiency of the decision-making process. Contrarily to the binary situation where misclassification occurs between two response classes, noise in ordinal categorical data is more complex due to the increased number of categories, diversity and asymmetry of errors. Although several approaches have been presented for dealing with misclassification in binary data, only limited practical methods have been proposed to analyze noisy categorical responses. A latent variable model implemented within a Bayesian framework was proposed to analyze ordinal categorical data subject to misclassification using simulated and real datasets. The simulated scenario consisted of a discrete response with three categories and a symmetric error rate of 5% between any two classes. The real data consisted of calving ease records of beef cows. Using real and simulated data, ignoring misclassification resulted in substantial bias in the estimation of genetic parameters and reduction of the accuracy of predicted genetic breeding values. Using our proposed approach, a significant reduction in bias and increase in accuracy ranging from 11% to 17% was observed. Furthermore, the majority of the misclassified observations (in the simulated data) were identified with a substantially higher probability. While the extension to traits with more categories and asymmetric misclassification between adjacent classes is straightforward, it could be computationally costly. For traits with higher heritability, the performance of the methodology would be expected to improve.