Enhanced Video Frame Interpolation using a Hybrid CNN-GAN Framework

Research-based Final Year Project for Bachelor's degree in Computer Science

Illustration of Frame Interpolation | Source

Abstract

Video Frame Interpolation (VFI) is a critical task in computer vision, enabling smoother motion in videos by generating intermediate frames. Traditional optical flow-based methods, such as Lucas-Kanade and Gunnar Farneback, often fail to handle complex motion, occlusions, and fine details effectively, leading to noticeable artifacts. To overcome these limitations, I propose an enhanced VFI approach using a hybrid deep learning framework that combines Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GANs).

In this project, I first implemented frame interpolation using traditional optical flow techniques to analyze their effectiveness and limitations. I then developed a deep learning-based solution incorporating the Super SloMo model, which utilizes CNN for more accurate motion estimation, and Real-ESRGAN, a GAN-based model for enhancing frame quality and restoring details. By integrating these approaches, I aim to significantly improve frame accuracy, temporal coherence, and overall visual fidelity.

This project is not intended to be the best VFI solution available but rather to demonstrate how a relatively lightweight deep learning-based framework can significantly outperform traditional optical flow methods while remaining computationally efficient. The performance of both approaches was compared using qualitative and quantitative metrics, highlighting the advantages of deep learning in video enhancement. The expected outcome of this project is a more robust and accessible VFI model that can be applied to slow-motion video generation, frame rate upscaling, and video restoration, showcasing the potential of deep learning in advancing video processing techniques.

Results

Input Video @ 12.5 fps

car_low_trim.mp4

Output Video @ 25 fps 4K Upscaled

car_high_gan_trim.mp4

Note: The videos shown here are trimmed out of the original input and output vidoes present in the drive folder

Method	PSNR	SSIM
Lucas-Kanade	30.49	0.56
Gunnar-Farneback	29.94	0.48
Super-SloMo	30.92	0.83
Super-SloMo + Real-ESRGAN	30.95	0.81

For more details on the results, refer to the documentation present in the Google Drive folder

Links

Demo Video | Google Drive

Note: The drive link contains a 'data' folder which has all the input and outputs of the 3 implementations done in this project. Refer to the 'Readme.txt' in the data folder for more details

Pre-Requisites

I recommend creating two environments, one for Lucas-Kanade, Gunnar-Farneback & Super-SloMo implementation and install the packages in requirements.txt using the command

pip install -r requirements.txt
The second enviroment should be created to run Real-ESRGAN as it has a dependency on an older version of pytorch which in turns requires an older version of numpy. Install the packages in requirements-gan.txt using the command

pip install -r requirements-gan.txt
After installing the packages for Real-ESRGAN, develop the setup.py using the command

pip setup.py develop

Please refer to the offical repo of Real-ESRGAN linked at the bottom if any issue arises.

System Specifications

CPU: Intel Xeon E5-1680v4 | 8c 16t @ max 3.4ghz when all cores active
RAM: 32GB DDR4 @ Quad Channel
GPU: RTX 2060 6GB

Directory Structure

Folder: Frame Utils

frame.py --> frame and video handling code
metrics.py --> metrics calculation code (psnr and ssim)

Folder: Gunnar Farneback

gf.py --> Implementation of VFI using GF Optical Flow Estimation

Folder: Lucas Kanade

optk_flow.py --> Implementation of VFI using LK Optical Flow Estimation

Folder: Super-SloMo

eval.py --> Run Super-SloMo on provided video

Usage:

python eval.py data/input.mp4 --checkpoint=data/SuperSloMo.ckpt --output=data/output.mp4 --scale=4

Use python eval.py --help for more details
Download the model checkpoint SuperSloMo.ckpt and place it in "data/"

Folder: Real-ESRGAN

inference_realesrgan.py --> Run Real-ESRGAN on provided frames

Usage:
```
  python inference_realesrgan.py -n RealESRGAN_x4plus -i infile -o outfile [options]...
```
A common command:

python inference_realesrgan.py -n RealESRGAN_x4plus -i infile --outscale 3.5 --face_enhance
Following is the list of options available:

-h show this help
-i --input Input image or folder. Default: inputs
-o --output Output folder. Default: results
-n --model_name Model name. Default: RealESRGAN_x4plus
-s, --outscale The final upsampling scale of the image. Default: 4
--suffix Suffix of the restored image. Default: out
-t, --tile Tile size, 0 for no tile during testing. Default: 0
--face_enhance Whether to use GFPGAN to enhance face. Default: False
--fp32 Use fp32 precision during inference. Default: fp16 (half precision).
--ext Image extension. Options: auto | jpg | png, auto means using the same extension as inputs.

Default: auto
Download the model weights RealESRGAN_x4plus.pth that were used and place it under "weights/"

Note: Folder 4 and 5 only mentions the main file that was run to generate the results It also contains only the necessary files required to replicate the results of my project For other features, visit the Github repository for each model

SuperSloMo | Real-ESRGAN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhanced Video Frame Interpolation using a Hybrid CNN-GAN Framework

Abstract

Results

Input Video @ 12.5 fps

Output Video @ 25 fps 4K Upscaled

Links

Pre-Requisites

System Specifications

Directory Structure

Usage:

Usage:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Frame-Utils		Frame-Utils
Gunnar-Farneback		Gunnar-Farneback
Lucas-Kanade		Lucas-Kanade
Real-ESRGAN		Real-ESRGAN
Super-SloMo		Super-SloMo
LICENSE		LICENSE
README.md		README.md
requirements-gan.txt		requirements-gan.txt
requirements.txt		requirements.txt

License

MuhammadHabibKhan/video-frame-interpolation

Folders and files

Latest commit

History

Repository files navigation

Enhanced Video Frame Interpolation using a Hybrid CNN-GAN Framework

Abstract

Results

Input Video @ 12.5 fps

Output Video @ 25 fps 4K Upscaled

Links

Pre-Requisites

System Specifications

Directory Structure

Usage:

Usage:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages