Skip to content

GouthamVicky/Velocity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Velocity: Optimized LLM Inference Serving Engine

Velocity is an adaptive LLM inference optimization and serving engine that dynamically optimizes Hugging Face models using JAX, Flash Attention 2, and Q-learning. It automatically creates an optimized inference pipeline based on the user-provided Hugging Face model card, and the progress is displayed in a Streamlit UI.

πŸ“Œ Project Goals

  • βœ… Enable high-performance LLM inference on edge devices & Apple Silicon.
  • βœ… Reduce latency and memory footprint using advanced optimizations.
  • βœ… Automate inference pipeline creation from Hugging Face model cards.
  • βœ… Leverage RL-based Q-learning for dynamic batch size & precision tuning.
  • βœ… Provide a Streamlit UI for real-time progress tracking.

⚑ Key Features

  • πŸ“₯ Dynamic Model Optimization: Accepts any Hugging Face model card as input.
  • ⚑ JAX-based Inference Engine: Uses JIT compilation for accelerated execution.
  • πŸš€ Flash Attention 2 Acceleration: Reduces memory load & improves speed.
  • 🎯 Q-learning for Adaptive Optimization: Dynamically selects best batch & precision.
  • πŸ”— FastAPI Backend: Optimized model serving via API.
  • πŸ“Š Streamlit UI: Displays pipeline progress and shows inference results.

πŸ› οΈ Roadmap

  • Dynamic Model Card Inference
  • Flash Attention 2 Integration
  • Q-Learning for Adaptive Optimization
  • GPU/TPU Support for Faster Execution
  • Real-time Monitoring & Metrics in Streamlit
  • Docker & Cloud Deployment

πŸ“§ Contact For questions or collaborations, reach out to [email protected]

🚧 Project Status: Very Early Development 🚧
This repository is in its early stages of development. Features are subject to change, and some functionalities may not be fully implemented yet.

About

Adaptive LLM Inference Optimization & Serving Engine πŸš€

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published