Video Preprocessing Pipeline

A Python package for efficient preprocessing of large video data (>300 GB), optimized for memory efficiency using dask array as well as time efficiency using JAX and CUDA.

Features

This package processes high-resolution microscopy videos with:

Support for ND2/TIF formats and multi-channel 2D/3D images
Fixed pattern noise computation and subtraction
Configurable pixel binning for enhanced signal and reduced file size
Multiple output formats (NRRD, TIF) with maximum intensity projection generation for 3D images

Installation

System Requirements

Python 3.10+
NVIDIA GPU with CUDA support
GPU Memory: >= 8GB recommended
System Memory: >= 32GB recommended

Clone this repository

# Clone the repository
git clone git@github.com:flavell-lab/private_preprocess_videos.git
cd private_preprocess_videos

Install dependencies using pip

# Create and activate virtual environment
python -m venv env
source env/bin/activate

# Install dependencies from requirements.txt
pip install -r requirements.txt

# For GPU support
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

User-specified parameters

--input_path: Path to input file (.nd2 or .tif)
--output_dir: Directory for processed outputs
--noise_path: Path to precomputed noise reference file (.tif)
--blank_dir: Directory containing noise reference files to be averaged
--chunk_size: Number of frames to process at once
--n_z: Number of Z-slices per volume
--x_range: X dimension range as "start,end"
--y_range: Y dimension range as "start,end"
--z_range: Z dimension range as "start,end"
--channels: Channels to process as "1,2"
--bitdepth: Bit depth of images
--binsize: Binning factor
--save_as: Output format ('nrrd' or 'tif')
--gpu: GPU device number (optional, only if user wants to dedicate a specific GPU to the current run)

Processing individual ND2 Files

python main.py \
    --input_path /path/to/raw/nd2 \
    --output_dir /path/to/processed/output \
    --blank_dir /path/to/blank/frames/directory \
    --chunk_size 64 \
    --n_z 80 \
    --x_range 0,966 \
    --y_range 0,630 \
    --z_range 3,80 \
    --channels 1,2 \
    --bitdepth 12 \
    --binsize 3 \
    --global_t_start 0 \
    --global_t_end 1600 \
    --gpu 2

You can also stay with the default parameters for typical 16-minute whole-brain calcium imaging recordings acquired with Flavell Lab's custom hardware and NIS Elements software by omitting all optional arguments, in which case the following operations will be performed:

After 3x3 binning, a uniform background intensity of 800 is assumed for all frames, which will be subtracted from each superpixel. Each frame is then clamped to fit in the range of (0, 4096] as 12-bit and exported as NRRD files. These operations minimize nonlinear transformations in the preprocessing step while preserving variance in pixel values. Processed images are ready to be passed into pretrained neural networks for segmentation and registration.

python main.py \
    --input_path /store1/shared/panneuralGFP_SWF1212/data_raw/2025-02-06/2025-02-06-01.nd2 \
    --output_dir /store1/shared/panneuralGFP_SWF1212/data_processed/2025-02-06-01_output

Processing multiple ND2 Files at once

Attach a metadata.txt in your input directory, and submit multiple nd2 files by running the script submit_multiple.sh. Here is an example of the metadata file, which defines all the immobilized recordings affiliated with each freely moving recording:

"2025-03-18-21":
{
all_red = "2025-03-18-22"
mNeptune = "2025-03-18-23"
OFP = "2025-03-18-24"
BFP = "2025-03-18-25"
}

"2025-03-18-26":
{
all_red = "2025-03-18-28"
mNeptune = "2025-03-18-30"
OFP = "2025-03-18-31"
BFP = "2025-03-18-32"
}

Below is the an example command line to submit all nd2 files acquired on a given day as specified in metadata.txt in one go. Specify which CUDA device you'd like to use with flag -g:

./submit_multiple.sh \
    -i "/store1/shared/panneuralGFP_new/data_raw/2025-03-18" \
    -o "/store1/shared/panneuralGFP_new/data_processed" \
    -g 2

This will first print all the commands that call main.py, then execute all of them.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
fileio.py		fileio.py
main.py		main.py
noise.py		noise.py
process.py		process.py
requirements.txt		requirements.txt
submit_multiple.sh		submit_multiple.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Preprocessing Pipeline

Features

Installation

System Requirements

Clone this repository

Install dependencies using pip

User-specified parameters

Processing individual ND2 Files

Processing multiple ND2 Files at once

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video Preprocessing Pipeline

Features

Installation

System Requirements

Clone this repository

Install dependencies using pip

User-specified parameters

Processing individual ND2 Files

Processing multiple ND2 Files at once

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages