My first computer vision object detection project.
I used to work night shifts in film production, scrubbing through thousands of clips to find the exact moment the clapper shut. This repo is my attempt to automate that tedious workflow (built in roughly ~64 hours of focused work).
This project automates the detection of film clapperboards (slates) in video frames using deep learning object detection models. Originally motivated by the tedious task of manually finding clapper moments in thousands of video clips during night shifts in film production, this project demonstrates an end-to-end computer vision workflow—from data collection to model deployment.
Training Notebook (Training.ipynb):
- Trains an object detection model using the IceVision framework
- Uses a custom dataset of 224 labeled images in COCO format
- Supports multiple model architectures (EfficientDet, Faster R-CNN, RetinaNet, etc.)
- Includes data augmentation and validation
- Exports trained model checkpoints
Inference Notebook (Inference.ipynb):
- Loads trained model checkpoints
- Runs predictions on new images
- Visualizes bounding box detections
- Exports predictions in COCO annotation format
Collected initial training dataset of 50+ slate images from instructional videos using browser screenshot tools. The dataset was iteratively expanded through active learning (see the “Active Learning Loop” section below).
Set up LabelStudio via Docker for local annotation work. This open-source tool provided an efficient interface for drawing bounding boxes around slates in training images.
Implemented DVC (Data Version Control) to version-control the dataset and sync training data to S3, enabling reproducible experiments and collaboration.
Provisioned AWS EC2 g4dn.xlarge Spot Instances for GPU-accelerated training. Used Terraform for infrastructure-as-code to make spinning up and tearing down compute resources quick and cost-effective.
Leveraged the IceVision framework for rapid experimentation. IceVision provides unified abstractions over multiple object detection libraries (FastAI, PyTorch TorchVision, MMDetection), making it easy to try different architectures.
Integrated Weights & Biases (WandB) for experiment tracking across different model architectures and hyperparameters. This made it straightforward to compare:
- Faster R-CNN
- YOLOv5
- EfficientDet
- VFNet
Results showed Faster R-CNN provided superior performance for this use case. Each training run took approximately 8 minutes on the g4dn.xlarge instance.
Implemented an iterative improvement process:
- Train model on current labeled dataset
- Use model to auto-label new images
- Manually review and correct erroneous predictions
- Retrain model with expanded dataset
- Repeat
This active learning approach efficiently scaled the dataset from 50 to 224+ labeled images while maintaining high annotation quality.
Demo: Film Slate Binary Classifier on Hugging Face
Note: This demo uses a simpler binary classifier that determines whether a slate is present in an image (yes/no), rather than the full object detection model that draws bounding boxes around slates.
Images labeled in Heartex LabelStudio:

