Chinedu Innocent Nwoye, Kareem Elgohary, Anvita Srinivas, Fauzan Zaid, Joël L. Lavanchy, and Nicolas Padoy
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, ${\color{lightgreen}CVPR \space 2025}$
CholecTrack20 is a surgical video dataset focusing on laparoscopic cholecystectomy and designed for surgical tool tracking, featuring 20 annotated videos. The dataset includes detailed labels for multi-class multi-tool tracking, offering trajectories for tool visibility within the camera scope, intracorporeal movement within the patient's body, and the life-long intraoperative trajectory of each tool. Annotations cover spatial coordinates, tool class, operator identity, phase, visual conditions (occlusion, bleeding, smoke), and more for tools like grasper, bipolar, hook, scissors, clipper, irrigator, and specimen bag, with annotations provided at 1 frame per second across 35K frames and 65K instance tool labels. The dataset uses official splits, allocating 10 videos for training, 2 for validation, and 8 for testing.
The novel CholecTrack20 dataset consists of 20 videos of laparoscopic procedures that have been fully annotated with detailed labels for multi-class multi-tool tracking. The ground truth annotations specify tool identities and trajectories, facilitating training and evaluation of tracking algorithms.
The dataset provides track identities across 3 perspectives of track definition:
- visibility trajectory of a tool within the camera scope,
- intracorporeal trajectory of a tool while within a patient's body, and
- life long intraoperative trajectory of a tool.
Intraoperative tracking not only re-identifies tools out of camera view (OOCV) as done in intracorporeal tracking but also maintains their trajectory when out of body (OOB).
In the CholecTrack20 dataset, OOB is detected/annotated either by visually observing the tool exit the trocar, inferring from another tool entering through the same trocar, or noting that the initial tool releases its grasp while out of camera focus. By considering all the three perspectives in CholecTrack20 dataset, we present a multi-perspective strategy that seeks to mitigate biases, identity mismatches, and fragmentation that can arise from learning solely from a single viewpoint.
Raw data comprises anonymized, endoscopic video data of laparoscopic cholecystectomy. Annotations include surgical tool information for the entire video sequences as well as information about the surgical conditions surrounding the tools. The dataset provides detailed labels for each tool such as spatial bounding box coordinates, class identities, track identities, operator identities, phase identities, frame visual conditions such as occlusion, bleeding, and presence of smoke statuses, among others.
The annotated tool categories are grasper, bipolar, hook, scissors, clipper, irrigator and specimen bag. The annotated tool operators are main surgeon left hand (MSLH), main surgeon right hand (MSRH), assistant surgeon right hand (ASRH) and null operator (NULL).
The table below shows a complete list of the label attributes of CholecTrack20 dataset including attributes introduced in this dataset and those which are inherited from existing datasets from the source record.
The annotations are provided at 1 frame per second (FPS) consisting of 35K frames and 65K instance tool labels. Raw videos, recorded at 25 FPS, are provided for inference.
The dataset is a single zip file organized with into three sub-directories for the data splits as illustrated in the left figure below. Each split contains a directory per video sequence which in turn contains the following items: (1) a raw annonymized .mp4 video file recorded at 25 FPS for the test set, (2) a folder of 1 fps sampled frames from the video that corresponds to the labels, and (3) a JavaScript Object Notation (.JSON) file for the labels.
The JSON file is structured as a dictionary of frame records with the frame IDs as the keys as illustrated in the right figure. The number of records corresponds to the number of tools within the frame. Each record, belonging to a particular tool is tagged with all associated labels as dictionary attributes (e.g: tool_bbox: [120.0, 132.0,23.0,65.0], category: 3, operator: 3, intraoperative_track_id: 7, etc.).
DetEval - Custom code for tool detection built on COCO API for Average Precision (AP) meterics. code
TrackEval - Adapted trackEval to include CholecTrack20 benchmark. The metric library is built on widely used CLEAR MOT, Identity, VACE, Track mAP, J & F, ID Euclidean, and HOTA metrics. Either you pull from original trackEval repo or you clone our adaptation code
- Benchmark of widely used and SOTA models on tool detection.
- Mean AP Results reported across detection thresholds, tool categories, and surgical visual challenges.
| Detection model | Detection AP accross 3 thresholds | Detection AP per category. (\% AP @ |
Detection AP across surgical visual challenges | Speed | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grasper | Bipolar | Hook | Scissors | Clipper | Irrigator | Bag | Bleeding | Blur | Smoke | Crowded | Occluded | Reflection | Foul Lens | Trocar |
|
||||
| Faster-RCNN | 56.0 | 38.1 | 34.6 | 53.5 | 65.0 | 80.1 | 60.9 | 70.1 | 26.8 | 31.8 | 57.9 | 41.0 | 54.5 | 43.5 | 55.0 | 46.9 | 41.2 | 35.7 | 7.6 |
| Cascade-RCNN | 51.7 | 39.0 | 34.7 | 52.0 | 58.9 | 79.7 | 45.7 | 44.9 | 23.7 | 17.9 | 53.9 | 39.0 | 48.1 | 39.5 | 46.4 | 29.1 | 33.7 | 33.7 | 7.0 |
| CenterNet | 53.0 | 39.5 | 35.0 | 60.2 | 61.4 | 86.4 | 56.3 | 68.0 | 25.8 | 10.2 | 58.0 | 42.1 | 50.2 | 36.7 | 51.7 | 46.0 | 35.8 | 30.8 | 33.8 |
| FCOS | 43.5 | 31.5 | 28.1 | 51.2 | 44.3 | 74.7 | 49.2 | 54.2 | 21.9 | 7.2 | 47.8 | 40.6 | 51.5 | 15.1 | 40.8 | 42.7 | 29.7 | 17.6 | 7.7 |
| SSD | 61.9 | 37.8 | 36.1 | 75.2 | 62.2 | 91.6 | 63.4 | 72.9 | 22.5 | 40.8 | 64.5 | 49.3 | 58.3 | 57.5 | 62.4 | 53.9 | 47.7 | 42.6 | 30.9 |
| PAA | 64.5 | 44.9 | 41.1 | 69.6 | 79.0 | 89.2 | 68.7 | 74.2 | 37.6 | 28.9 | 67.1 | 55.6 | 65.0 | 55.0 | 64.6 | 56.0 | 51.2 | 47.5 | 8.5 |
| Def-DETR | 58.4 | 42.0 | 38.3 | 60.6 | 66.5 | 83.8 | 61.9 | 72.0 | 39.9 | 23.8 | 62.4 | 42.7 | 58.6 | 37.1 | 57.4 | 43.9 | 41.5 | 47.4 | 10.2 |
| Swin-T | 62.3 | 44.3 | 40.2 | 63.3 | 64.8 | 83.0 | 80.2 | 77.2 | 38.0 | 26.8 | 63.5 | 53.8 | 62.8 | 35.3 | 61.1 | 66.2 | 55.2 | 45.7 | 9.8 |
| YOLOX | 64.7 | 48.9 | 44.2 | 69.6 | 72.2 | 89.4 | 75.4 | 79.1 | 37.3 | 27.1 | 68.2 | 55.6 | 66.0 | 45.9 | 64.2 | 52.5 | 58.1 | 43.1 | 23.6 |
| YOLOv7 | 80.6 | 62.0 | 56.1 | 90.5 | 86.4 | 96.0 | 82.3 | 89.3 | 49.1 | 66.2 | 80.2 | 61.2 | 80.1 | 79.5 | 82.1 | 65.6 | 71.2 | 66.7 | 20.6 |
| YOLOv8 | 79.1 | 62.4 | 55.6 | 87.9 | 84.5 | 96.2 | 80.0 | 87.2 | 48.4 | 65.0 | 77.1 | 58.3 | 74.4 | 76.2 | 80.4 | 70.3 | 57.4 | 62.9 | 29.0 |
| YOLOv9 | 80.2 | 62.6 | 56.5 | 88.5 | 87.6 | 96.0 | 79.3 | 87.1 | 50.1 | 67.7 | 78.1 | 54.0 | 78.2 | 78.6 | 81.1 | 65.3 | 63.4 | 63.1 | 23.7 |
| YOLOv10 | 80.1 | 62.1 | 55.8 | 87.6 | 86.6 | 96.0 | 81.9 | 89.0 | 53.8 | 61.3 | 77.8 | 61.9 | 78.7 | 77.5 | 81.2 | 66.7 | 59.3 | 65.4 | 28.6 |
Leaderboard available on Papers with code
- Benchmark Multi-Perspective Multi-Tool Tracking Results @ 25 FPS
- Evaluated across multiple metrics: HOTA, CLEAR MOT, Identity, Count, and Efficiency Metrics
- Model assessed across 3 tracking perspectives
| Model | HOTA Metrics | CLEAR Metrics | Identity Metrics | Count Metrics | Speed | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HOTA↑ | DetA↑ | LocA↑ | AssA↑ | MOTA↑ | MOTP↑ | MT↑ | PT↓ | ML↓ | IDF1↑ | IDSW↓ | Frag↓ | #Dets | #IDs | FPS↑ | |||
| Intraoperative Trajectory (Groundtruth counts: #Dets = 29994, #IDs = 70) | |||||||||||||||||
| OCSORT | 14.6 | 52.7 | 86.7 | 4.1 | 49.2 | 85.0 | 24 | 32 | 14 | 9.5 | 2921 | 2731 | 21936 | 3336 | 10.2 | ||
| FairMOT | 5.8 | 25.8 | 75.9 | 1.3 | 5.0 | 73.9 | 3 | 24 | 43 | 4.3 | 4227 | 1924 | 15252 | 4456 | 14.2 | ||
| TransTrack | 7.4 | 31.5 | 84.4 | 1.7 | 4.2 | 82.9 | 9 | 36 | 25 | 4.2 | 4757 | 1899 | 21640 | 4079 | 6.7 | ||
| ByteTrack | 15.8 | 70.6 | 85.7 | 3.6 | 67.0 | 84.0 | 54 | 12 | 2 | 9.5 | 4648 | 2429 | 28440 | 5383 | 16.4 | ||
| Bot-SORT | 17.4 | 70.7 | 85.4 | 4.4 | 69.6 | 83.7 | 58 | 11 | 1 | 10.2 | 3907 | 2376 | 29302 | 4501 | 8.7 | ||
| SMILETrack | 15.9 | 71.0 | 85.5 | 3.7 | 66.4 | 83.8 | 55 | 13 | 2 | 9.2 | 4968 | 2369 | 28821 | 5761 | 11.2 | ||
| Intracorporeal Trajectory (Groundtruth counts: #Dets = 29994, #IDs = 247) | |||||||||||||||||
| OCSORT | 23.7 | 51.4 | 86.5 | 11.0 | 47.1 | 84.8 | 115 | 87 | 45 | 18.1 | 2953 | 2796 | 21797 | 3526 | 10.2 | ||
| FairMOT | 7.5 | 19.7 | 76.1 | 2.9 | 5.4 | 74.0 | 19 | 60 | 168 | 6.0 | 2890 | 1496 | 11287 | 3962 | 14.2 | ||
| TransTrack | 13.1 | 31.5 | 84.4 | 5.5 | 4.6 | 82.9 | 80 | 79 | 88 | 8.7 | 4648 | 1791 | 21640 | 4079 | 6.7 | ||
| ByteTrack | 24.7 | 70.6 | 85.7 | 8.7 | 67.4 | 84.0 | 176 | 48 | 23 | 16.9 | 4515 | 2290 | 28440 | 5383 | 16.4 | ||
| Bot-SORT | 27.0 | 70.7 | 85.4 | 10.4 | 70.0 | 83.7 | 188 | 38 | 21 | 18.9 | 3771 | 2238 | 29300 | 4501 | 8.7 | ||
| SMILETrack | 24.9 | 66.7 | 85.5 | 8.9 | 66.7 | 83.8 | 186 | 39 | 22 | 16.9 | 4868 | 2232 | 28820 | 5779 | 11.2 | ||
| Visibility Trajectory (Groundtruth counts: #Dets = 29994, #IDs = 916) | |||||||||||||||||
| SORT | 17.4 | 39.5 | 85.2 | 7.8 | 21.4 | 83.3 | 139 | 399 | 378 | 13.4 | 6619 | 2138 | 16595 | 8844 | 19.5 | ||
| OCSORT | 37.0 | 52.6 | 86.5 | 26.2 | 50.2 | 84.8 | 300 | 371 | 245 | 35.9 | 2317 | 2260 | 22197 | 3587 | 10.2 | ||
| FairMOT | 15.3 | 25.0 | 75.8 | 9.5 | 7.1 | 73.7 | 58 | 218 | 640 | 14.4 | 3140 | 1574 | 15338 | 4875 | 14.2 | ||
| TransTrack | 19.2 | 31.6 | 84.4 | 11.8 | 5.8 | 82.9 | 224 | 280 | 412 | 16.1 | 4273 | 1403 | 21640 | 4079 | 6.7 | ||
| ByteTrack | 41.5 | 70.7 | 85.7 | 24.8 | 69.3 | 84.0 | 591 | 217 | 108 | 36.8 | 3930 | 1704 | 28440 | 5383 | 16.4 | ||
| Bot-SORT | 44.7 | 70.8 | 85.5 | 28.7 | 72.0 | 83.7 | 638 | 184 | 94 | 41.4 | 3183 | 1638 | 29300 | 4505 | 8.7 | ||
| SMILETrack | 41.3 | 71.0 | 85.6 | 24.4 | 68.9 | 83.8 | 619 | 192 | 105 | 36.5 | 4227 | 1641 | 28821 | 5752 | 11.2 | ||
Leaderboard available on Papers with code
- SurgiTrack - a new SOTA method on the dataset is published as here
Steps to obtain the dataset:
-
Read the Data Use Agreement (DUA)
-
Dataset is released under the license CC-BY-NC-SA 4.0
-
Complete the dataset request form to receive the download
accesskey, it will be needed in the next step, keep it safe! The form starts to accept response from 25 March 2025. -
Visit the data portal at Synapse.org to download the dataset.
Use the conversion.py to convert the JSON annotation to your preferred format, e.g. MOTChallenge, COCO, TAO, etc., and you can use their corresponding dataloader.
The dataset originates from the CAMMA research group at the University of Strasbourg, France and shares content with Cholec80 and CholecT50, which are among the largest public surgical video datasets used in the surgical workflow analysis. As a result, there are overlaps with these datasets and other cholecystectomy datasets sourced from the same medical center.
To maintain consistency and facilitate identification of overlapping videos, we preserved the video identities (e.g., VID01, VID02, VID12, VID111, etc.) in our dataset. It's important to recognize that the prefix "VID" in the video filenames may be written as "Video" in other datasets. The figure below illustrates the videos and labels of CholecTrack20 that overlaps with other cholecystectomy datasets. Researchers are encouraged to consider these overlaps when pre-training their models on related cholecystectomy datasets.
This work was supported by French state funds managed within the Plan Investissements d’Avenir by the ANR under references: National AI Chair AI4ORSafety [ANR-20-CHIA-0029-01], DeepSurg [ANR-16-CE33-0009], IHU Strasbourg [ANR-10-IAHU-02] and by BPI France under references: project CONDOR, project 5G-OR. Joël L. Lavanchy received funding by the Swiss National Science Foundation (P500PM_206724, P5R5PM_217663). This work was granted access to the servers/HPC resources managed by CAMMA, IHU Strasbourg, Unistra Mesocentre, and GENCI-IDRIS [Grant 2021-AD011011638R3, 2021-AD011011638R4].
Metric evaluation part of the codes are borrowed from TrackEval and Cocoapi. Thanks for their excellent work!
- Conference
@InProceedings{nwoye2023cholectrack20,
author = {Nwoye, Chinedu Innocent and Elgohary , Kareem and Srinivas, Anvita and Zaid, Fauzan and Lavanchy, Joël L. and Padoy, Nicolas},
title = {CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025},
month = {June}
}
- arXiv
@misc{nwoye2023cholectrack20,
title={CholecTrack20: A Dataset for Multi-Class Multiple Tool Tracking in Laparoscopic Surgery},
author={Chinedu Innocent Nwoye and Kareem Elgohary and Anvita Srinivas and Fauzan Zaid and Joël L. Lavanchy and Nicolas Padoy},
year={2023},
eprint={2312.07352},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
CholecTrack - An endoscopic video dataset for multi-class multi-tool tracking defined across 3 different perspectives of considering the temporal duration of a tool trajectory: (a) intraoperative, (b) intracorporeal, and (c) visibility.
We welcome contributions of new metrics and new supported benchmarks. Also any other new features or code improvements. Send a PR, an email, or open an issue detailing what you'd like to add/change to begin a conversation.











