Location: Environmentally Integrated Dairy Management Research
Title: Pose estimation based on keypoints and monocular depth estimation for predicting cattle body weight and hip heightAuthor
![]() |
MENEZES,, GUILHERME - University Of Wisconsin |
![]() |
SEITZ, ALYSSA - University Of Wisconsin |
![]() |
CASELLA, ENRICO - Pennsylvania State University |
![]() |
MONTES GONZALES, MARIA - University Of Wisconsin |
![]() |
NEGREIRO, ARIANA - University Of Wisconsin |
![]() |
HIGAKI, SHOGO - University Of Wisconsin |
![]() |
BRESOLIN, TIAGO - University Of Illinois Urbana-Champaign |
![]() |
ROSA, GUILHERME - University Of Wisconsin |
![]() |
Akins, Matthew |
![]() |
DOREA, JOAO - University Of Wisconsin |
|
Submitted to: Journal of Animal Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/9/2026 Publication Date: N/A Citation: N/A Interpretive Summary: In precision livestock farming, tracking animal growth is important for feeding animals more accurately, improving breeding decisions, and predicting farm income. Body Weight (BW) and Hip Height (HH) are important metrics used to evaluate growth. However, taking these measurements by hand is time-consuming and difficult to consistently deploy on farms. In this study, keypoints placed on the animal body were used to estimate BW and HH. The models predicted BW and HH with an R2 of 0.90 and 0.77, and errors of predictions of 6.27% and 2.51%, respectively. These results were similar to those from 3D cameras, but the present method used 2D sensors, which are simpler and more affordable for farms to adopt. Technical Abstract: Accurate and automated estimation of body weight (BW) and hip height (HH) is crucial for monitoring animal growth and optimizing management decisions, but these traits are not systematically measured in farm setting conditions due to labor-intensive manual procedures. Computer vision systems (CVS), using pose estimation techniques, offer a promising approach to automatically evaluate body biometrics and generate features that can be used to study growth development. Therefore, this study aimed to (1) develop a predictive model for BW and HH based on features extracted from body pose keypoints and (2) compare the results of models using keypoint-derived features with those using features from depth images. A total of 395 top-down view videos from 94 beef-on-dairy crossbred cattle, collected across two different diets in four experimental blocks, were captured using depth and infrared sensors. BW was recorded using an electronic scale, and HH was manually measured using a measuring stick. The keypoint detection model used in the first approach was trained to detect a total of seven anatomical landmarks for morphometric feature extraction using infrared 2D images. Two strategies were applied: using seven keypoints (Strategy 1) and using six keypoints from the rump to investigate the accuracy of BW and HH predictions when only the rump is visible (Strategy 2). To address the second objective, depth images were used to extract features such as volume, area, circularity, eccentricity, and height along the back and widths. Three models were trained using the extracted features: Random Forest (RF), Partial Least Squares Regression (PLS), and Support Vector Regression (SVM). Model performance was assessed using leave-one-block-out cross-validation. The SVM model trained with Strategy 1 achieved an R2 of 0.90, and a Root Mean Square Error (RMSE) of 32.0 kg, which corresponds to 6.3% of the average BW. The models trained with keypoints from the rump (i.e. Strategy 2) achieved similar results. Using features extracted from depth images, an SVM model achieved an R2 of 0.95 and RMSE of 24.6 kg, which corresponds to 4.6% of the average observed BW. For HH prediction, the RF model trained with Strategy 1 achieved an R2 of 0.76, and an RMSE of 3.0 cm, which corresponds to 2.31% of the average HH. Models trained using features extracted from depth images achieved similar results. Our findings demonstrate that keypoint-based approaches accurately predict BW and HH, performing similarly to models using features from 3D images. |
