This repository is set of starter code for the Computer Vision Capstone Project. Computer vision is a domain that has real world applications in almost every industry, from healthcare, agriculture, industrial automation, transportation, to sports. Many computer vision tasks, such as smart traffic control, quality management, and autonomous driving involve real time analysis. These domains benefit greatly from performant computer vision algorithms.
Your goal for this project is to apply the knowledge you learned from CS 4624: Scientific Computing Capstone to implement performant, parallel computer vision algorithms.
There are four dimensions in this project, this section will cover the baseline expectations for the project as well as directions to pursue to improve the final outcome of this project.
| Algorithm | Environment | Device | System | |
|---|---|---|---|---|
| Baseline | Edge Detection | Offline: Image/Video | CPU: OpenMP | rlogin & glogin |
| Extensions | Object Detection | Online: live video | GPU: OpenACC | Raspberry Pi 5 |
The expectation for this project is that, at a minimum, students implement a performant and accurate edge detection algorithm on the CPU using OpenMP. The students may use rlogin and glogin to do so. Given their parallel algorithm, students should be able to show performance improvements over a serial algorithm using images or video, and present their results in a meaningful way.
Edge Detection
Edge detection algorithms extract the boundaries between features in an image. There exist many different edge detection algorithms, each with its own set of strengths and weaknesses. The starter code includes a serial implementation of the Sobel Operator, a gradient based approach which convolves the image with 2 3x3 convolution masks. One is intended to detect changes in the horizontal direction and the other to detect changes in the vertical direction. Students are encouraged to explore other edge detection algorithms such as Canny Edge, Roberts-Cross Operator, Prewitt Edge, Laplacian Edge, etc. These algorithms will provide valuable insight onto the performance to accuracy tradeoffs of each approach.
Object Detection
Unlike edge detection, which focuses purely on changes in intensity between pixels, object detection systems are trained to learn texture, shape, and context. Modern object detection pipelines typically involve two major stages: feature extraction and prediction. Some common algorithms for object detection include
-
Haar Cascades: machine learning based approach that is trained on a set of positive and negative images. Positive images include the object the algorithm is training to detect and negative images do not include the object
-
Histogram of Oriented Gradients + Support Vector Machine Algorithm: The HOG algorithm captures the shape of objects from within an image creating histograms of gradients and orientations of pixels. The output of the HOG algorithm is an image of extracted features with well defined shape. This is passed to a SVM algorithm which is built to define boundaries between classes of data. This algorithm is particularly popular for human and vehicle detection.
-
Selective Search: This is an object localization algorithm which is a computationally expensive step of object detection. Selective search combines exhaustive search and segmentation. Exhaustive search involves sliding windows of varying sizes along an image to locate objects of all sizes. Segmentation is a method to separate objects of different shapes by assigning them to different colors.
-
Two-Stage Detectors: These include algorithms such as R-CNN, Fast R-CNN, and Faster R-CNN. In the first stage, models scan the image to generate regions of interest, or proposals, that are likely to contain objects. In the second stage, each proposed region is concurrently passed to a classification head and regression head. The clarification head determines the class of the object while the regression head defines the bounding box of the object. Two-stage detectors prioritize accuracy over speed.
-
One-stage detectors: These include algorithms such as YOLO and Single-Shot Detector. In these algorithms, feature extraction, clarification, and boundary prediction are all done in a single pass. Techniques such as non-maximum suppression are used to filter out redundant information and produce a final output. One-stage algorithms prioritize speed over accuracy.
Offline: Image/Video
Images and video are an important place to start when building a computer vision algorithm. They provide a static dataset for correctness testing and when combined with ground truth data, are incredibly useful for accuracy testing. In this repository there is a set of 30 images with ground truth data for testing. Images are useful for collecting single frame latency benchmarks, and videos can be used for throughput profiling (Frames per second).
Online: live video
Live video is where algorithm meets application. This extension is highly linked to the Raspberry Pi device extension. The SyNeRGy Lab @ VT has access to Raspberry Pi Compute Module 5 boards as well as Pcam 5C 5 MP cameras that students may use during class time. The Pcam 5C is not a natively supported camera that can interface with the Raspberry Pi liccamera software. This extension will majorly revolve around students ability to find/build driver support for this camera.
CPU: OpenMP
Students should use OpenMP to parallelize their computer vision algorithms where possible. Using the knowledge from this class, students should be able to effectively use OpenMP pragmas and apply the necessary clauses to optimize their solutions. Edge detection algorithms lend themselves well to parallelism so students should see substantial speedups in their code.
GPU: OpenACC
This semester students learned to utilize parallelism on the GPU using OpenACC. Computer vision algorithms benefit largely from the mass parallelism the GPU provides. The caveat to the OpenACC extension is that OpenACC is only supported on NVidia GPUs. This means results from this extension will be separate from the results obtained by using the Raspberry Pi. The use of OpenACC still has many use cases within Computer Vision as NVidia provides a number of edge devices such as the Jetson Orin Nano that can be used for computer vision.
rlogin & glogin
Students may use rlogin and glogin as well as their personal devices for development. All of these systems will produce results for CPU driven implementations using OpenMP. Both rlogin and glogin have OpenMP available making them ideal systems for development. If students implement their solutions using OpenACC, they should move to glogin. glogin nodes have NVidia Tesla T4 GPUs and have access to the required libraries and compilers used by OpenACC.
Raspberry Pi 5
The Raspberry Pi 5 is a small, low-power embedded development board widely used for edge computing and real-world computer vision deployments. Over 7 million units were sold in 2024, reflecting its popularity across education, hobbyist projects, and industry prototyping. The Pi 5 is built around the Broadcom BCM2712 system-on-chip, which includes a quad-core Arm Cortex-A76 CPU (4 cores, 4 threads). It is available in two variants: the standard board with full-size ports, and the Compute Module 5, which exposes all interfaces through compact board-to-board connectors for integration into custom hardware.
Evaluating computer vision algorithms on the Raspberry Pi 5 provides valuable context about their real-world performance. Its small form factor, efficient power consumption, and increasingly capable hardware make it a realistic target platform for field-deployed CV tasks such as monitoring, sensing, and lightweight inference. By profiling and optimizing algorithms on the Pi 5, students gain insight into the constraints and design considerations required for computer vision systems running at the edge.
This section details all the files students will be provided with.
This file is a serial implementation of the Sobel operator. The code takes a png file as input and outputs a png image of the edges detected by the Sobel operator. It also times the duration of the operation. After compilation, run this file using
./sobel input.png output.png
This is a file that implements the Pratt's Figure of Merit equation. Pratt's FOM is a metric to evaluate the accuracy of an edge detection algorithm. It takes a png produced by an edge detection algorithm and a png of a ground truth image and returns a accuracy value. After compilation, run this file using
./accuracy ideal.png detected.png
This is a python edge detection script that uses OpenCV's cv2 library. This script does a Gaussian blur on the image to reduce noise, then runs the Canny Edge algorithm to extract edges from the image. This script can be used to give extra context the students accuracy results produced by their algorithms. To run this file use
python3 edge.py input.png output.png
Before running edge.py please install OpenCV in python by running
pip install opencv-python
The provided Makefile is used to compile the serial Sobel code and the PFOM code. Students should revise as needed.
These header files are used by sobel-serial.c to read and write to image files.
This is a github repository containing 30 png images and accompanying ground truth images that can be used as a starting dataset to produce results with. Feel free to look for or create additional datasets as needed.