Skip to content

CAMMA-public/cholectrack20

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CholecTrack20

A Multi-Perspective Tracking Dataset for Surgical Tools

Chinedu Innocent Nwoye, Kareem Elgohary, Anvita Srinivas, Fauzan Zaid, Joël L. Lavanchy, and Nicolas Padoy

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, ${\color{lightgreen}CVPR \space 2025}$


CVPR Paper Read on ArXiv Supplementary Material



Abstract

CholecTrack20 is a surgical video dataset focusing on laparoscopic cholecystectomy and designed for surgical tool tracking, featuring 20 annotated videos. The dataset includes detailed labels for multi-class multi-tool tracking, offering trajectories for tool visibility within the camera scope, intracorporeal movement within the patient's body, and the life-long intraoperative trajectory of each tool. Annotations cover spatial coordinates, tool class, operator identity, phase, visual conditions (occlusion, bleeding, smoke), and more for tools like grasper, bipolar, hook, scissors, clipper, irrigator, and specimen bag, with annotations provided at 1 frame per second across 35K frames and 65K instance tool labels. The dataset uses official splits, allocating 10 videos for training, 2 for validation, and 8 for testing.




Contents

The novel CholecTrack20 dataset consists of 20 videos of laparoscopic procedures that have been fully annotated with detailed labels for multi-class multi-tool tracking. The ground truth annotations specify tool identities and trajectories, facilitating training and evaluation of tracking algorithms.

Demo GIF


Multi-Perspective Trajectory

The dataset provides track identities across 3 perspectives of track definition:

  1. visibility trajectory of a tool within the camera scope,
  2. intracorporeal trajectory of a tool while within a patient's body, and
  3. life long intraoperative trajectory of a tool.

Demo GIF


Intraoperative tracking not only re-identifies tools out of camera view (OOCV) as done in intracorporeal tracking but also maintains their trajectory when out of body (OOB).

In the CholecTrack20 dataset, OOB is detected/annotated either by visually observing the tool exit the trocar, inferring from another tool entering through the same trocar, or noting that the initial tool releases its grasp while out of camera focus. By considering all the three perspectives in CholecTrack20 dataset, we present a multi-perspective strategy that seeks to mitigate biases, identity mismatches, and fragmentation that can arise from learning solely from a single viewpoint.


Demo GIF


Data Record

Raw data comprises anonymized, endoscopic video data of laparoscopic cholecystectomy. Annotations include surgical tool information for the entire video sequences as well as information about the surgical conditions surrounding the tools. The dataset provides detailed labels for each tool such as spatial bounding box coordinates, class identities, track identities, operator identities, phase identities, frame visual conditions such as occlusion, bleeding, and presence of smoke statuses, among others.

The annotated tool categories are grasper, bipolar, hook, scissors, clipper, irrigator and specimen bag. The annotated tool operators are main surgeon left hand (MSLH), main surgeon right hand (MSRH), assistant surgeon right hand (ASRH) and null operator (NULL).

The table below shows a complete list of the label attributes of CholecTrack20 dataset including attributes introduced in this dataset and those which are inherited from existing datasets from the source record.
Demo GIF

Data Statistics

The annotations are provided at 1 frame per second (FPS) consisting of 35K frames and 65K instance tool labels. Raw videos, recorded at 25 FPS, are provided for inference.


Demo GIF


Data Structure

The dataset is a single zip file organized with into three sub-directories for the data splits as illustrated in the left figure below. Each split contains a directory per video sequence which in turn contains the following items: (1) a raw annonymized .mp4 video file recorded at 25 FPS for the test set, (2) a folder of 1 fps sampled frames from the video that corresponds to the labels, and (3) a JavaScript Object Notation (.JSON) file for the labels.

The JSON file is structured as a dictionary of frame records with the frame IDs as the keys as illustrated in the right figure. The number of records corresponds to the number of tools within the frame. Each record, belonging to a particular tool is tagged with all associated labels as dictionary attributes (e.g: tool_bbox: [120.0, 132.0,23.0,65.0], category: 3, operator: 3, intraoperative_track_id: 7, etc.).
Demo GIF

Explore Samples

Demo GIF


Visualization and Validation

Demo GIF Demo GIF


Evaluation Metrics and Libraries

DetEval - Custom code for tool detection built on COCO API for Average Precision (AP) meterics. code

TrackEval - Adapted trackEval to include CholecTrack20 benchmark. The metric library is built on widely used CLEAR MOT, Identity, VACE, Track mAP, J & F, ID Euclidean, and HOTA metrics. Either you pull from original trackEval repo or you clone our adaptation code


Detection Benchmark and Leaderboard

  • Benchmark of widely used and SOTA models on tool detection.
  • Mean AP Results reported across detection thresholds, tool categories, and surgical visual challenges.
Detection model Detection AP accross 3 thresholds Detection AP per category. (\% AP @ $\Theta=0.5$) Detection AP across surgical visual challenges Speed
$AP_{0.5}\uparrow$ $AP_{0.75}\uparrow$ $AP_{0.5:0.95}\uparrow$ Grasper Bipolar Hook Scissors Clipper Irrigator Bag Bleeding Blur Smoke Crowded Occluded Reflection Foul Lens Trocar $\text{FPS}\uparrow$
Faster-RCNN 56.0 38.1 34.6 53.5 65.0 80.1 60.9 70.1 26.8 31.8 57.9 41.0 54.5 43.5 55.0 46.9 41.2 35.7 7.6
Cascade-RCNN 51.7 39.0 34.7 52.0 58.9 79.7 45.7 44.9 23.7 17.9 53.9 39.0 48.1 39.5 46.4 29.1 33.7 33.7 7.0
CenterNet 53.0 39.5 35.0 60.2 61.4 86.4 56.3 68.0 25.8 10.2 58.0 42.1 50.2 36.7 51.7 46.0 35.8 30.8 33.8
FCOS 43.5 31.5 28.1 51.2 44.3 74.7 49.2 54.2 21.9 7.2 47.8 40.6 51.5 15.1 40.8 42.7 29.7 17.6 7.7
SSD 61.9 37.8 36.1 75.2 62.2 91.6 63.4 72.9 22.5 40.8 64.5 49.3 58.3 57.5 62.4 53.9 47.7 42.6 30.9
PAA 64.5 44.9 41.1 69.6 79.0 89.2 68.7 74.2 37.6 28.9 67.1 55.6 65.0 55.0 64.6 56.0 51.2 47.5 8.5
Def-DETR 58.4 42.0 38.3 60.6 66.5 83.8 61.9 72.0 39.9 23.8 62.4 42.7 58.6 37.1 57.4 43.9 41.5 47.4 10.2
Swin-T 62.3 44.3 40.2 63.3 64.8 83.0 80.2 77.2 38.0 26.8 63.5 53.8 62.8 35.3 61.1 66.2 55.2 45.7 9.8
YOLOX 64.7 48.9 44.2 69.6 72.2 89.4 75.4 79.1 37.3 27.1 68.2 55.6 66.0 45.9 64.2 52.5 58.1 43.1 23.6
YOLOv7 80.6 62.0 56.1 90.5 86.4 96.0 82.3 89.3 49.1 66.2 80.2 61.2 80.1 79.5 82.1 65.6 71.2 66.7 20.6
YOLOv8 79.1 62.4 55.6 87.9 84.5 96.2 80.0 87.2 48.4 65.0 77.1 58.3 74.4 76.2 80.4 70.3 57.4 62.9 29.0
YOLOv9 80.2 62.6 56.5 88.5 87.6 96.0 79.3 87.1 50.1 67.7 78.1 54.0 78.2 78.6 81.1 65.3 63.4 63.1 23.7
YOLOv10 80.1 62.1 55.8 87.6 86.6 96.0 81.9 89.0 53.8 61.3 77.8 61.9 78.7 77.5 81.2 66.7 59.3 65.4 28.6

Leaderboard available on Papers with code


Tracking Benchmark and Leaderboard

Demo GIF


  • Benchmark Multi-Perspective Multi-Tool Tracking Results @ 25 FPS
  • Evaluated across multiple metrics: HOTA, CLEAR MOT, Identity, Count, and Efficiency Metrics
  • Model assessed across 3 tracking perspectives
Model HOTA Metrics CLEAR Metrics Identity Metrics Count Metrics Speed
HOTA↑ DetA↑ LocA↑ AssA↑ MOTA↑ MOTP↑ MT↑ PT↓ ML↓ IDF1↑ IDSW↓ Frag↓ #Dets #IDs FPS↑
Intraoperative Trajectory (Groundtruth counts: #Dets = 29994, #IDs = 70)
OCSORT 14.6 52.7 86.7 4.1 49.2 85.0 24 32 14 9.5 2921 2731 21936 3336 10.2
FairMOT 5.8 25.8 75.9 1.3 5.0 73.9 3 24 43 4.3 4227 1924 15252 4456 14.2
TransTrack 7.4 31.5 84.4 1.7 4.2 82.9 9 36 25 4.2 4757 1899 21640 4079 6.7
ByteTrack 15.8 70.6 85.7 3.6 67.0 84.0 54 12 2 9.5 4648 2429 28440 5383 16.4
Bot-SORT 17.4 70.7 85.4 4.4 69.6 83.7 58 11 1 10.2 3907 2376 29302 4501 8.7
SMILETrack 15.9 71.0 85.5 3.7 66.4 83.8 55 13 2 9.2 4968 2369 28821 5761 11.2
Intracorporeal Trajectory (Groundtruth counts: #Dets = 29994, #IDs = 247)
OCSORT 23.7 51.4 86.5 11.0 47.1 84.8 115 87 45 18.1 2953 2796 21797 3526 10.2
FairMOT 7.5 19.7 76.1 2.9 5.4 74.0 19 60 168 6.0 2890 1496 11287 3962 14.2
TransTrack 13.1 31.5 84.4 5.5 4.6 82.9 80 79 88 8.7 4648 1791 21640 4079 6.7
ByteTrack 24.7 70.6 85.7 8.7 67.4 84.0 176 48 23 16.9 4515 2290 28440 5383 16.4
Bot-SORT 27.0 70.7 85.4 10.4 70.0 83.7 188 38 21 18.9 3771 2238 29300 4501 8.7
SMILETrack 24.9 66.7 85.5 8.9 66.7 83.8 186 39 22 16.9 4868 2232 28820 5779 11.2
Visibility Trajectory (Groundtruth counts: #Dets = 29994, #IDs = 916)
SORT 17.4 39.5 85.2 7.8 21.4 83.3 139 399 378 13.4 6619 2138 16595 8844 19.5
OCSORT 37.0 52.6 86.5 26.2 50.2 84.8 300 371 245 35.9 2317 2260 22197 3587 10.2
FairMOT 15.3 25.0 75.8 9.5 7.1 73.7 58 218 640 14.4 3140 1574 15338 4875 14.2
TransTrack 19.2 31.6 84.4 11.8 5.8 82.9 224 280 412 16.1 4273 1403 21640 4079 6.7
ByteTrack 41.5 70.7 85.7 24.8 69.3 84.0 591 217 108 36.8 3930 1704 28440 5383 16.4
Bot-SORT 44.7 70.8 85.5 28.7 72.0 83.7 638 184 94 41.4 3183 1638 29300 4505 8.7
SMILETrack 41.3 71.0 85.6 24.4 68.9 83.8 619 192 105 36.5 4227 1641 28821 5752 11.2

Leaderboard available on Papers with code


Tracking Across Scene Visual Challenges

Demo GIF Demo GIF Demo GIF


Emerging Research Methods

  • SurgiTrack - a new SOTA method on the dataset is published as here

Download

Steps to obtain the dataset:

  1. Read the Data Use Agreement (DUA)

  2. Dataset is released under the license CC-BY-NC-SA 4.0

  3. Complete the dataset request form to receive the download accesskey, it will be needed in the next step, keep it safe! The form starts to accept response from 25 March 2025.

  4. Visit the data portal at Synapse.org to download the dataset.


Usage

Use the conversion.py to convert the JSON annotation to your preferred format, e.g. MOTChallenge, COCO, TAO, etc., and you can use their corresponding dataloader.


Potential Overlap

The dataset originates from the CAMMA research group at the University of Strasbourg, France and shares content with Cholec80 and CholecT50, which are among the largest public surgical video datasets used in the surgical workflow analysis. As a result, there are overlaps with these datasets and other cholecystectomy datasets sourced from the same medical center. To maintain consistency and facilitate identification of overlapping videos, we preserved the video identities (e.g., VID01, VID02, VID12, VID111, etc.) in our dataset. It's important to recognize that the prefix "VID" in the video filenames may be written as "Video" in other datasets. The figure below illustrates the videos and labels of CholecTrack20 that overlaps with other cholecystectomy datasets. Researchers are encouraged to consider these overlaps when pre-training their models on related cholecystectomy datasets.
Demo GIF



Acknowledgement

This work was supported by French state funds managed within the Plan Investissements d’Avenir by the ANR under references: National AI Chair AI4ORSafety [ANR-20-CHIA-0029-01], DeepSurg [ANR-16-CE33-0009], IHU Strasbourg [ANR-10-IAHU-02] and by BPI France under references: project CONDOR, project 5G-OR. Joël L. Lavanchy received funding by the Swiss National Science Foundation (P500PM_206724, P5R5PM_217663). This work was granted access to the servers/HPC resources managed by CAMMA, IHU Strasbourg, Unistra Mesocentre, and GENCI-IDRIS [Grant 2021-AD011011638R3, 2021-AD011011638R4].


Metric evaluation part of the codes are borrowed from TrackEval and Cocoapi. Thanks for their excellent work!

Publication & Citations

  • Conference
@InProceedings{nwoye2023cholectrack20,
  author    = {Nwoye, Chinedu Innocent and Elgohary , Kareem  and Srinivas, Anvita and Zaid, Fauzan and Lavanchy, Joël L.  and Padoy, Nicolas},
  title     = {CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
  month     = {June}
}
  • arXiv

@misc{nwoye2023cholectrack20,
    title={CholecTrack20: A Dataset for Multi-Class Multiple Tool Tracking in Laparoscopic Surgery},
    author={Chinedu Innocent Nwoye and Kareem Elgohary and Anvita Srinivas and Fauzan Zaid and Joël L. Lavanchy and Nicolas Padoy},
    year={2023},
    eprint={2312.07352},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

CholecTrack - An endoscopic video dataset for multi-class multi-tool tracking defined across 3 different perspectives of considering the temporal duration of a tool trajectory: (a) intraoperative, (b) intracorporeal, and (c) visibility.


Contributing

We welcome contributions of new metrics and new supported benchmarks. Also any other new features or code improvements. Send a PR, an email, or open an issue detailing what you'd like to add/change to begin a conversation.

About

Dataset for multi-perspective surgical tool tracking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published