LFM2-VL-1.6B Vision Language Model

This folder contains scripts and tools for using the LFM2-VL-1.6B vision language model from Liquid AI. The model is designed for processing text and images with variable resolutions, optimized for low-latency and edge AI applications.

🚀 Features

2× faster inference speed on GPUs compared to existing VLMs
Flexible architecture with user-tunable speed-quality tradeoffs
Native resolution processing up to 512×512 pixels
Lightweight: Only 1.6B parameters
Multimodal: Processes both text and images

📁 Files

save_lfm2_vl_model.py - Downloads and saves the model locally
instagram_caption_generator.py - Generates Instagram-like captions for images
test_setup.py - Tests the setup and dependencies
requirements.txt - Required Python packages
run_preprocessing.sh - Launcher for Instagram dataset preprocessing
preprocessing/ - Folder containing all preprocessing scripts
README.md - This file

🛠️ Setup

1. Install Dependencies

pip install -r requirements.txt

2. Download the Model

python save_lfm2_vl_model.py

This will download the ~1.6GB model to ./lfm2_vl_1_6b_model/

3. Test the Setup

python test_setup.py

📸 Usage

Preprocess Instagram Dataset for Training

# Quick preprocessing (recommended)
./run_preprocessing.sh InstaDataset.zip

# This will create a training-ready dataset in ./processed_dataset/instagram_dataset/

Train the Model

# Start training on the processed dataset
python3 train_lfm2_instagram_trainer.py \
    --data-dir ./processed_dataset/instagram_dataset \
    --output-dir ./trained_model \
    --num-epochs 5 \
    --batch-size 1 \
    --learning-rate 5e-5

# Training will create checkpoints and a final model in ./trained_model/

Generate Instagram Captions

# Basic usage with the provided image
python instagram_caption_generator.py --image ../img1.jpg

# Generate multiple captions with different styles
python instagram_caption_generator.py --image ../img1.jpg --style creative --num-captions 5

# Use a different output file
python instagram_caption_generator.py --image ../img1.jpg --output my_captions.txt

Available Styles

instagram - Trendy, relatable captions with hashtags
professional - Business-appropriate descriptions
casual - Friendly, conversational tone
creative - Artistic, mood-capturing captions

Command Line Options

python instagram_caption_generator.py --help

Options:

--image, -i - Path to image file (required)
--style, -s - Caption style (default: instagram)
--num-captions, -n - Number of captions to generate (default: 3)
--output, -o - Output file for captions (default: generated_captions.txt)
--model-path, -m - Path to local model (default: ./lfm2_vl_1_6b_model)

🔧 Technical Details

Model Architecture

Language Model: LFM2-1.2B backbone
Vision Encoder: SigLIP2 NaFlex shape-optimized (400M parameters)
Hybrid Backbone: Combines convolution and attention layers
Context: 32,768 text tokens
Image Tokens: Dynamic, user-tunable
Precision: bfloat16

Performance

The model achieves competitive performance on various benchmarks:

RealWorldQA: 65.23
MM-IFEval: 37.66
InfoVQA: 58.68
OCRBench: 742
MMStar: 49.53

Memory Requirements

GPU Memory: ~3-4GB for inference
Model Size: ~1.6GB on disk
RAM: ~2-3GB additional

💡 Example Output

For the provided img1.jpg image, the model might generate captions like:

📸 Caption 1:
Beautiful sunset vibes! 🌅 Nature never fails to amaze me. 
#sunset #nature #photography #beautiful #peaceful

📸 Caption 2:
When the sky paints itself in golden hour magic ✨ 
#goldenhour #sky #photography #nature #beauty

📸 Caption 3:
Sunset serenity - the perfect way to end the day 🌅
#serenity #sunset #peace #nature #photography

🚨 Troubleshooting

Common Issues

CUDA Out of Memory
- Reduce max_tokens in the generation parameters
- Use CPU if GPU memory is insufficient
Model Not Found
- Ensure you've run save_lfm2_vl_model.py first
- Check the model path in the script
Dependencies Missing
- Install requirements: pip install -r requirements.txt
- Ensure you have Python 3.8+ and PyTorch 2.0+

Performance Tips

Use device_map="auto" for automatic device placement
Set torch_dtype="bfloat16" for memory efficiency
Adjust max_image_tokens for speed/quality tradeoff

📚 References

📄 License

This model uses the LFM Open License v1.0. Please review the license terms on the Hugging Face model page.

🤝 Support

For issues or questions:

Check the troubleshooting section above
Review the model documentation on Hugging Face
Test with python test_setup.py to verify your setup

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
lfm2_federated_learning		lfm2_federated_learning
preprocessing		preprocessing
.gitignore		.gitignore
InstaDataset.zip		InstaDataset.zip
README.md		README.md
demo.py		demo.py
generate_personlaized_caption.py		generate_personlaized_caption.py
generated_captions.txt		generated_captions.txt
instagram_caption_generator.py		instagram_caption_generator.py
quick_start_training.sh		quick_start_training.sh
requirements.txt		requirements.txt
requirements_training.txt		requirements_training.txt
run_preprocessing.sh		run_preprocessing.sh
save_lfm2_vl_model.py		save_lfm2_vl_model.py
setup_and_run.sh		setup_and_run.sh
test_setup.py		test_setup.py
test_trained_model.py		test_trained_model.py
train_lfm2_instagram_trainer.py		train_lfm2_instagram_trainer.py
train_lfm2_instagram_trainer1.2b.py		train_lfm2_instagram_trainer1.2b.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LFM2-VL-1.6B Vision Language Model

🚀 Features

📁 Files

🛠️ Setup

1. Install Dependencies

2. Download the Model

3. Test the Setup

📸 Usage

Preprocess Instagram Dataset for Training

Train the Model

Generate Instagram Captions

Available Styles

Command Line Options

🔧 Technical Details

Model Architecture

Performance

Memory Requirements

💡 Example Output

🚨 Troubleshooting

Common Issues

Performance Tips

📚 References

📄 License

🤝 Support

About

Uh oh!

Releases

Packages

Languages

frank4591/LiquidTraining

Folders and files

Latest commit

History

Repository files navigation

LFM2-VL-1.6B Vision Language Model

🚀 Features

📁 Files

🛠️ Setup

1. Install Dependencies

2. Download the Model

3. Test the Setup

📸 Usage

Preprocess Instagram Dataset for Training

Train the Model

Generate Instagram Captions

Available Styles

Command Line Options

🔧 Technical Details

Model Architecture

Performance

Memory Requirements

💡 Example Output

🚨 Troubleshooting

Common Issues

Performance Tips

📚 References

📄 License

🤝 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages