Skip to content

MuhammadHabibKhan/video-frame-interpolation

Repository files navigation

Enhanced Video Frame Interpolation using a Hybrid CNN-GAN Framework

Research-based Final Year Project for Bachelor's degree in Computer Science


image Illustration of Frame Interpolation | Source

Abstract

Video Frame Interpolation (VFI) is a critical task in computer vision, enabling smoother motion in videos by generating intermediate frames. Traditional optical flow-based methods, such as Lucas-Kanade and Gunnar Farneback, often fail to handle complex motion, occlusions, and fine details effectively, leading to noticeable artifacts. To overcome these limitations, I propose an enhanced VFI approach using a hybrid deep learning framework that combines Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GANs).

In this project, I first implemented frame interpolation using traditional optical flow techniques to analyze their effectiveness and limitations. I then developed a deep learning-based solution incorporating the Super SloMo model, which utilizes CNN for more accurate motion estimation, and Real-ESRGAN, a GAN-based model for enhancing frame quality and restoring details. By integrating these approaches, I aim to significantly improve frame accuracy, temporal coherence, and overall visual fidelity.

This project is not intended to be the best VFI solution available but rather to demonstrate how a relatively lightweight deep learning-based framework can significantly outperform traditional optical flow methods while remaining computationally efficient. The performance of both approaches was compared using qualitative and quantitative metrics, highlighting the advantages of deep learning in video enhancement. The expected outcome of this project is a more robust and accessible VFI model that can be applied to slow-motion video generation, frame rate upscaling, and video restoration, showcasing the potential of deep learning in advancing video processing techniques.

Results

Input Video @ 12.5 fps

car_low_trim.mp4

Output Video @ 25 fps 4K Upscaled

car_high_gan_trim.mp4

Note: The videos shown here are trimmed out of the original input and output vidoes present in the drive folder

Method PSNR SSIM
Lucas-Kanade 30.49 0.56
Gunnar-Farneback 29.94 0.48
Super-SloMo 30.92 0.83
Super-SloMo + Real-ESRGAN 30.95 0.81

For more details on the results, refer to the documentation present in the Google Drive folder

Links

Demo Video | Google Drive

Note: The drive link contains a 'data' folder which has all the input and outputs of the 3 implementations done in this project. Refer to the 'Readme.txt' in the data folder for more details

Pre-Requisites

  • I recommend creating two environments, one for Lucas-Kanade, Gunnar-Farneback & Super-SloMo implementation and install the packages in requirements.txt using the command

    pip install -r requirements.txt

  • The second enviroment should be created to run Real-ESRGAN as it has a dependency on an older version of pytorch which in turns requires an older version of numpy. Install the packages in requirements-gan.txt using the command

    pip install -r requirements-gan.txt

  • After installing the packages for Real-ESRGAN, develop the setup.py using the command

    pip setup.py develop

Please refer to the offical repo of Real-ESRGAN linked at the bottom if any issue arises.

System Specifications

  • CPU: Intel Xeon E5-1680v4 | 8c 16t @ max 3.4ghz when all cores active
  • RAM: 32GB DDR4 @ Quad Channel
  • GPU: RTX 2060 6GB

Directory Structure

  1. Folder: Frame Utils
  • frame.py --> frame and video handling code
  • metrics.py --> metrics calculation code (psnr and ssim)

  1. Folder: Gunnar Farneback
  • gf.py --> Implementation of VFI using GF Optical Flow Estimation

  1. Folder: Lucas Kanade
  • optk_flow.py --> Implementation of VFI using LK Optical Flow Estimation

  1. Folder: Super-SloMo
  • eval.py --> Run Super-SloMo on provided video

    Usage:

    python eval.py data/input.mp4 --checkpoint=data/SuperSloMo.ckpt --output=data/output.mp4 --scale=4

    Use python eval.py --help for more details

  • Download the model checkpoint SuperSloMo.ckpt and place it in "data/"


  1. Folder: Real-ESRGAN
  • inference_realesrgan.py --> Run Real-ESRGAN on provided frames

    Usage:
      python inference_realesrgan.py -n RealESRGAN_x4plus -i infile -o outfile [options]...
    

    A common command:

    python inference_realesrgan.py -n RealESRGAN_x4plus -i infile --outscale 3.5 --face_enhance

  • Following is the list of options available:

    -h show this help
    -i --input Input image or folder. Default: inputs
    -o --output Output folder. Default: results
    -n --model_name Model name. Default: RealESRGAN_x4plus
    -s, --outscale The final upsampling scale of the image. Default: 4
    --suffix Suffix of the restored image. Default: out
    -t, --tile Tile size, 0 for no tile during testing. Default: 0
    --face_enhance Whether to use GFPGAN to enhance face. Default: False
    --fp32 Use fp32 precision during inference. Default: fp16 (half precision).
    --ext Image extension. Options: auto | jpg | png, auto means using the same extension as inputs.

    Default: auto

  • Download the model weights RealESRGAN_x4plus.pth that were used and place it under "weights/"


Note: Folder 4 and 5 only mentions the main file that was run to generate the results It also contains only the necessary files required to replicate the results of my project For other features, visit the Github repository for each model

SuperSloMo | Real-ESRGAN

About

Research based FYP for my bachelor's degree in Computer Science

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages