Skip to content

vtsynergy/CV_Capstone

Repository files navigation

Computer Vision Capstone

Table of Contents

Overview

Objectives

Starter Code

Overview

This repository is set of starter code for the Computer Vision Capstone Project. Computer vision is a domain that has real world applications in almost every industry, from healthcare, agriculture, industrial automation, transportation, to sports. Many computer vision tasks, such as smart traffic control, quality management, and autonomous driving involve real time analysis. These domains benefit greatly from performant computer vision algorithms.

Your goal for this project is to apply the knowledge you learned from CS 4624: Scientific Computing Capstone to implement performant, parallel computer vision algorithms.

Objectives

There are four dimensions in this project, this section will cover the baseline expectations for the project as well as directions to pursue to improve the final outcome of this project.

Algorithm Environment Device System
Baseline Edge Detection Offline: Image/Video CPU: OpenMP rlogin & glogin
Extensions Object Detection Online: live video GPU: OpenACC Raspberry Pi 5

The expectation for this project is that, at a minimum, students implement a performant and accurate edge detection algorithm on the CPU using OpenMP. The students may use rlogin and glogin to do so. Given their parallel algorithm, students should be able to show performance improvements over a serial algorithm using images or video, and present their results in a meaningful way.

Algorithm

Edge Detection

Edge detection algorithms extract the boundaries between features in an image. There exist many different edge detection algorithms, each with its own set of strengths and weaknesses. The starter code includes a serial implementation of the Sobel Operator, a gradient based approach which convolves the image with 2 3x3 convolution masks. One is intended to detect changes in the horizontal direction and the other to detect changes in the vertical direction. Students are encouraged to explore other edge detection algorithms such as Canny Edge, Roberts-Cross Operator, Prewitt Edge, Laplacian Edge, etc. These algorithms will provide valuable insight onto the performance to accuracy tradeoffs of each approach.

Object Detection

Unlike edge detection, which focuses purely on changes in intensity between pixels, object detection systems are trained to learn texture, shape, and context. Modern object detection pipelines typically involve two major stages: feature extraction and prediction. Some common algorithms for object detection include

  • Haar Cascades: machine learning based approach that is trained on a set of positive and negative images. Positive images include the object the algorithm is training to detect and negative images do not include the object

  • Histogram of Oriented Gradients + Support Vector Machine Algorithm: The HOG algorithm captures the shape of objects from within an image creating histograms of gradients and orientations of pixels. The output of the HOG algorithm is an image of extracted features with well defined shape. This is passed to a SVM algorithm which is built to define boundaries between classes of data. This algorithm is particularly popular for human and vehicle detection.

  • Selective Search: This is an object localization algorithm which is a computationally expensive step of object detection. Selective search combines exhaustive search and segmentation. Exhaustive search involves sliding windows of varying sizes along an image to locate objects of all sizes. Segmentation is a method to separate objects of different shapes by assigning them to different colors.

  • Two-Stage Detectors: These include algorithms such as R-CNN, Fast R-CNN, and Faster R-CNN. In the first stage, models scan the image to generate regions of interest, or proposals, that are likely to contain objects. In the second stage, each proposed region is concurrently passed to a classification head and regression head. The clarification head determines the class of the object while the regression head defines the bounding box of the object. Two-stage detectors prioritize accuracy over speed.

  • One-stage detectors: These include algorithms such as YOLO and Single-Shot Detector. In these algorithms, feature extraction, clarification, and boundary prediction are all done in a single pass. Techniques such as non-maximum suppression are used to filter out redundant information and produce a final output. One-stage algorithms prioritize speed over accuracy.

Environment

Offline: Image/Video

Images and video are an important place to start when building a computer vision algorithm. They provide a static dataset for correctness testing and when combined with ground truth data, are incredibly useful for accuracy testing. In this repository there is a set of 30 images with ground truth data for testing. Images are useful for collecting single frame latency benchmarks, and videos can be used for throughput profiling (Frames per second).

Online: live video

Live video is where algorithm meets application. This extension is highly linked to the Raspberry Pi device extension. The SyNeRGy Lab @ VT has access to Raspberry Pi Compute Module 5 boards as well as Pcam 5C 5 MP cameras that students may use during class time. The Pcam 5C is not a natively supported camera that can interface with the Raspberry Pi liccamera software. This extension will majorly revolve around students ability to find/build driver support for this camera.

Device

CPU: OpenMP

Students should use OpenMP to parallelize their computer vision algorithms where possible. Using the knowledge from this class, students should be able to effectively use OpenMP pragmas and apply the necessary clauses to optimize their solutions. Edge detection algorithms lend themselves well to parallelism so students should see substantial speedups in their code.

GPU: OpenACC

This semester students learned to utilize parallelism on the GPU using OpenACC. Computer vision algorithms benefit largely from the mass parallelism the GPU provides. The caveat to the OpenACC extension is that OpenACC is only supported on NVidia GPUs. This means results from this extension will be separate from the results obtained by using the Raspberry Pi. The use of OpenACC still has many use cases within Computer Vision as NVidia provides a number of edge devices such as the Jetson Orin Nano that can be used for computer vision.

System

rlogin & glogin

Students may use rlogin and glogin as well as their personal devices for development. All of these systems will produce results for CPU driven implementations using OpenMP. Both rlogin and glogin have OpenMP available making them ideal systems for development. If students implement their solutions using OpenACC, they should move to glogin. glogin nodes have NVidia Tesla T4 GPUs and have access to the required libraries and compilers used by OpenACC.

Raspberry Pi 5

The Raspberry Pi 5 is a small, low-power embedded development board widely used for edge computing and real-world computer vision deployments. Over 7 million units were sold in 2024, reflecting its popularity across education, hobbyist projects, and industry prototyping. The Pi 5 is built around the Broadcom BCM2712 system-on-chip, which includes a quad-core Arm Cortex-A76 CPU (4 cores, 4 threads). It is available in two variants: the standard board with full-size ports, and the Compute Module 5, which exposes all interfaces through compact board-to-board connectors for integration into custom hardware.

Evaluating computer vision algorithms on the Raspberry Pi 5 provides valuable context about their real-world performance. Its small form factor, efficient power consumption, and increasingly capable hardware make it a realistic target platform for field-deployed CV tasks such as monitoring, sensing, and lightweight inference. By profiling and optimizing algorithms on the Pi 5, students gain insight into the constraints and design considerations required for computer vision systems running at the edge.

Starter Code

This section details all the files students will be provided with.

sobel-serial.c

This file is a serial implementation of the Sobel operator. The code takes a png file as input and outputs a png image of the edges detected by the Sobel operator. It also times the duration of the operation. After compilation, run this file using

./sobel input.png output.png

PFOM.c

This is a file that implements the Pratt's Figure of Merit equation. Pratt's FOM is a metric to evaluate the accuracy of an edge detection algorithm. It takes a png produced by an edge detection algorithm and a png of a ground truth image and returns a accuracy value. After compilation, run this file using

./accuracy ideal.png detected.png

edge.py

This is a python edge detection script that uses OpenCV's cv2 library. This script does a Gaussian blur on the image to reduce noise, then runs the Canny Edge algorithm to extract edges from the image. This script can be used to give extra context the students accuracy results produced by their algorithms. To run this file use

python3 edge.py input.png output.png

Before running edge.py please install OpenCV in python by running

pip install opencv-python

Makefile

The provided Makefile is used to compile the serial Sobel code and the PFOM code. Students should revise as needed.

stb_image.h & stb_image_write.h

These header files are used by sobel-serial.c to read and write to image files.

UDED

This is a github repository containing 30 png images and accompanying ground truth images that can be used as a starting dataset to produce results with. Feel free to look for or create additional datasets as needed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages