LocalizeSORT is a localization-aware multi-object tracking framework designed for agricultural yield estimation. It extends the DeepSORT and StrongSORT tracking pipeline by adding a world-coordinate (WC) association metric for stationary objects (for example, fruits and crops), improving identity consistency and counting accuracy.
The implementation in this repository is built on top of the BoxMOT-style tracking stack and includes adaptations for WC-aware association.
Title: LOCALIZESORT: LOCALIZATION-BASED STATIONARY OBJECT TRACKING IN PRECISION AGRICULTURE
https://dx.doi.org/10.2139/ssrn.4829514
Venue: ASME IMECE-INDIA2026
Authors: Srihari Vemuru, Kumar Ankit, Debasish Ghose, Shishir Kolathaya
LocalizeSORT adds localization cues to tracking-by-detection:
- Object detection is performed per frame (YOLO-family detectors).
- World coordinates are estimated per detection using one of:
- Sensor-based localization (SLoc): RGB-D + camera intrinsics + odometry transforms.
- Reconstruction-based localization (RLoc): 3D reconstruction from RGB frames.
- Detection-to-track association uses appearance, motion, and WC proximity.
This is targeted at stationary agricultural targets where world position should remain nearly constant across viewpoints.
- WC-aware StrongSORT variant in
boxmot/trackers/strongsort/strong_sort.py. - Tracker-level WC handling in
boxmot/trackers/strongsort/sort/tracker_wc.py. - Example script for running tracking with detections + WC inputs in
run.py. - Ready-to-use tracking stack from
boxmot/with multiple tracker baselines.
git clone <your-localizesort-repo-url>
cd LocalizeSORT
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install -e .Edit paths in run.py:
video_path: directory with ordered frame imagesdetections_path: per-frame 2D detectionswcs_path: per-frame world coordinatessave_results_file: output text file
Then run:
python run.pypython examples/track.py \
--yolo-model yolov8n.pt \
--tracking-method strongsort \
--source <video-or-folder> \
--save --save-motdetections.txtexpects one line per frame.- Each line contains zero or more detections separated by
;. - Each detection uses comma-separated
x1,y1,x2,y2values. - Empty line means no detections for that frame.
wcs.txtfollows the same per-frame structure, each item aswx,wy,wz.
| Dataset | Video | Frames | Localization | Tracking Challenges | Wind | Weather | Crop Stage |
|---|---|---|---|---|---|---|---|
| LettuceMOT | straight | 402 | RLoc | no reID, no latency | 9 mph | Cloudy | Rosette stage |
| LettuceMOT | B&F | 540 | RLoc | reID | 9 mph | Cloudy | Rosette stage |
| LettuceMOT | O&I | 1121 | RLoc | reID | 9 mph | Cloudy | Rosette stage |
| LFSD Dataset | normal | 1535 | SLoc | no reID, no latency | 7 mph | Clear | Heading stage |
| LFSD Dataset | missing frames | 1551 | SLoc | latency | 7 mph | Clear | Heading stage |
| LFSD Dataset | occlusion | 1535 | SLoc | reID, latency | 7 mph | Clear | Heading stage |
| Mango Dataset | tree1 | 790 | RLoc | reID | 15 mph | Clear | Fruit development |
| Mango Dataset | tree2 | 788 | RLoc | reID | 15 mph | Clear | Fruit development |
| Mango Dataset | tree3 | 570 | RLoc | reID | 15 mph | Clear | Fruit development |
Metrics: MOTA (%), HOTA (%), HOTA0 (%), ID switches.
| Dataset | Video | Model | MOTA | HOTA | HOTA0 | IDsw |
|---|---|---|---|---|---|---|
| LettuceMOT | straight | DeepSORT | 62.3 | 63.3 | 84.9 | 4 |
| LettuceMOT | straight | StrongSORT | 94.4 | 90.0 | 96.3 | 0 |
| LettuceMOT | straight | LocalizeSORT | 94.4 | 90.0 | 96.3 | 0 |
| LettuceMOT | B&F | DeepSORT | 74.9 | 55.2 | 68.4 | 43 |
| LettuceMOT | B&F | StrongSORT | 95.0 | 70.9 | 73.9 | 43 |
| LettuceMOT | B&F | LocalizeSORT | 96.6 | 93.4 | 98.5 | 0 |
| LettuceMOT | O&I | DeepSORT | 63.3 | 40.9 | 53.5 | 350 |
| LettuceMOT | O&I | StrongSORT | 92.8 | 67.8 | 74.4 | 29 |
| LettuceMOT | O&I | LocalizeSORT | 93.2 | 88.2 | 97.8 | 1 |
| LFSD Dataset | normal | DeepSORT | 67.4 | 41.2 | 76.1 | 319 |
| LFSD Dataset | normal | StrongSORT | 87.5 | 87.6 | 87.6 | 0 |
| LFSD Dataset | normal | LocalizeSORT | 89.6 | 89.6 | 89.6 | 0 |
| LFSD Dataset | missing frames | DeepSORT | 43.6 | 29.3 | 51.4 | 1779 |
| LFSD Dataset | missing frames | StrongSORT | 87.6 | 85.5 | 85.5 | 67 |
| LFSD Dataset | missing frames | LocalizeSORT | 89.5 | 89.5 | 89.5 | 0 |
| LFSD Dataset | occlusion | DeepSORT | 5.0 | 9.3 | 17.9 | 3417 |
| LFSD Dataset | occlusion | StrongSORT | 79.3 | 61.5 | 61.5 | 746 |
| LFSD Dataset | occlusion | LocalizeSORT | 86.3 | 86.3 | 86.3 | 40 |
| Video | Ground Truth | DeepSORT | StrongSORT | LocalizeSORT |
|---|---|---|---|---|
| Tree 1 | 40 | 258 | 68 | 55 |
| Tree 2 | 43 | 191 | 66 | 45 |
| Tree 3 | 123 | 263 | 217 | 187 |
If you use this repository or build upon LocalizeSORT, please cite the paper and this codebase.
@inproceedings{vemuru2026localizesort,
title={LOCALIZESORT: Localization-Based Stationary Object Tracking in Precision Agriculture},
author={Vemuru, Srihari and Ankit, Kumar and Ghose, Debasish and Kolathaya, Shishir},
booktitle={ASME IMECE-INDIA2026},
year={2026}
}This repository is distributed under AGPL-3.0 (see LICENSE).

