This folder contains scripts and tools for using the LFM2-VL-1.6B vision language model from Liquid AI. The model is designed for processing text and images with variable resolutions, optimized for low-latency and edge AI applications.
- 2× faster inference speed on GPUs compared to existing VLMs
- Flexible architecture with user-tunable speed-quality tradeoffs
- Native resolution processing up to 512×512 pixels
- Lightweight: Only 1.6B parameters
- Multimodal: Processes both text and images
save_lfm2_vl_model.py- Downloads and saves the model locallyinstagram_caption_generator.py- Generates Instagram-like captions for imagestest_setup.py- Tests the setup and dependenciesrequirements.txt- Required Python packagesrun_preprocessing.sh- Launcher for Instagram dataset preprocessingpreprocessing/- Folder containing all preprocessing scriptsREADME.md- This file
pip install -r requirements.txtpython save_lfm2_vl_model.pyThis will download the ~1.6GB model to ./lfm2_vl_1_6b_model/
python test_setup.py# Quick preprocessing (recommended)
./run_preprocessing.sh InstaDataset.zip
# This will create a training-ready dataset in ./processed_dataset/instagram_dataset/# Start training on the processed dataset
python3 train_lfm2_instagram_trainer.py \
--data-dir ./processed_dataset/instagram_dataset \
--output-dir ./trained_model \
--num-epochs 5 \
--batch-size 1 \
--learning-rate 5e-5
# Training will create checkpoints and a final model in ./trained_model/# Basic usage with the provided image
python instagram_caption_generator.py --image ../img1.jpg
# Generate multiple captions with different styles
python instagram_caption_generator.py --image ../img1.jpg --style creative --num-captions 5
# Use a different output file
python instagram_caption_generator.py --image ../img1.jpg --output my_captions.txtinstagram- Trendy, relatable captions with hashtagsprofessional- Business-appropriate descriptionscasual- Friendly, conversational tonecreative- Artistic, mood-capturing captions
python instagram_caption_generator.py --helpOptions:
--image, -i- Path to image file (required)--style, -s- Caption style (default: instagram)--num-captions, -n- Number of captions to generate (default: 3)--output, -o- Output file for captions (default: generated_captions.txt)--model-path, -m- Path to local model (default: ./lfm2_vl_1_6b_model)
- Language Model: LFM2-1.2B backbone
- Vision Encoder: SigLIP2 NaFlex shape-optimized (400M parameters)
- Hybrid Backbone: Combines convolution and attention layers
- Context: 32,768 text tokens
- Image Tokens: Dynamic, user-tunable
- Precision: bfloat16
The model achieves competitive performance on various benchmarks:
- RealWorldQA: 65.23
- MM-IFEval: 37.66
- InfoVQA: 58.68
- OCRBench: 742
- MMStar: 49.53
- GPU Memory: ~3-4GB for inference
- Model Size: ~1.6GB on disk
- RAM: ~2-3GB additional
For the provided img1.jpg image, the model might generate captions like:
📸 Caption 1:
Beautiful sunset vibes! 🌅 Nature never fails to amaze me.
#sunset #nature #photography #beautiful #peaceful
📸 Caption 2:
When the sky paints itself in golden hour magic ✨
#goldenhour #sky #photography #nature #beauty
📸 Caption 3:
Sunset serenity - the perfect way to end the day 🌅
#serenity #sunset #peace #nature #photography
-
CUDA Out of Memory
- Reduce
max_tokensin the generation parameters - Use CPU if GPU memory is insufficient
- Reduce
-
Model Not Found
- Ensure you've run
save_lfm2_vl_model.pyfirst - Check the model path in the script
- Ensure you've run
-
Dependencies Missing
- Install requirements:
pip install -r requirements.txt - Ensure you have Python 3.8+ and PyTorch 2.0+
- Install requirements:
- Use
device_map="auto"for automatic device placement - Set
torch_dtype="bfloat16"for memory efficiency - Adjust
max_image_tokensfor speed/quality tradeoff
This model uses the LFM Open License v1.0. Please review the license terms on the Hugging Face model page.
For issues or questions:
- Check the troubleshooting section above
- Review the model documentation on Hugging Face
- Test with
python test_setup.pyto verify your setup