🚀 Velocity: Optimized LLM Inference Serving Engine

Velocity is an adaptive LLM inference optimization and serving engine that dynamically optimizes Hugging Face models using JAX, Flash Attention 2, and Q-learning. It automatically creates an optimized inference pipeline based on the user-provided Hugging Face model card, and the progress is displayed in a Streamlit UI.

📌 Project Goals

✅ Enable high-performance LLM inference on edge devices & Apple Silicon.
✅ Reduce latency and memory footprint using advanced optimizations.
✅ Automate inference pipeline creation from Hugging Face model cards.
✅ Leverage RL-based Q-learning for dynamic batch size & precision tuning.
✅ Provide a Streamlit UI for real-time progress tracking.

⚡ Key Features

📥 Dynamic Model Optimization: Accepts any Hugging Face model card as input.
⚡ JAX-based Inference Engine: Uses JIT compilation for accelerated execution.
🚀 Flash Attention 2 Acceleration: Reduces memory load & improves speed.
🎯 Q-learning for Adaptive Optimization: Dynamically selects best batch & precision.
🔗 FastAPI Backend: Optimized model serving via API.
📊 Streamlit UI: Displays pipeline progress and shows inference results.

🛠️ Roadmap

Dynamic Model Card Inference
Flash Attention 2 Integration
Q-Learning for Adaptive Optimization
GPU/TPU Support for Faster Execution
Real-time Monitoring & Metrics in Streamlit
Docker & Cloud Deployment

📧 Contact For questions or collaborations, reach out to [email protected]

🚧 Project Status: Very Early Development 🚧
This repository is in its early stages of development. Features are subject to change, and some functionalities may not be fully implemented yet.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
config		config
frontend		frontend
notebook		notebook
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock
velcoity_test.py		velcoity_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Velocity: Optimized LLM Inference Serving Engine

📌 Project Goals

⚡ Key Features

About

Uh oh!

Releases

Packages

Languages

License

GouthamVicky/Velocity

Folders and files

Latest commit

History

Repository files navigation

🚀 Velocity: Optimized LLM Inference Serving Engine

📌 Project Goals

⚡ Key Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages