Levy is a semantic caching engine for LLM APIs, designed as a research prototype for a Computer Science Capstone project. It sits between your application and an LLM provider (like OpenAI) to optimize costs and latency by reusing responses for identical or semantically similar prompts.
- Exact Match Caching: Extremely fast retrieval for identical prompts.
- Semantic Caching: Uses vector embeddings (via
sentence-transformers) to find and reuse answers for similar meaning queries (e.g., "What is the capital of France?" vs "Tell me France's capital"). - Metrics: Automatically tracks cache hit rates, latency and estimated token savings.
- Pluggable Architecture: Easy to swap LLM providers or Vector Stores.
levy/
├── levy/ # Core package
│ ├── cache/ # Cache logic (Exact, Semantic, Store)
│ ├── llm_client.py # LLM interaction (Mock, OpenAI)
│ ├── embeddings.py # Vector embedding logic
│ ├── engine.py # Main orchestration engine
│ └── models.py # Data classes
├── examples/ # Demo scripts
└── tests/ # Unit tests
- Ensure you have Conda installed.
- Create the environment:
conda env create -f environment.yml
- Activate the environment:
conda activate levy
from levy import LevyEngine, LevyConfig
# Initialize with defaults (Mock LLM, Exact Cache only)
engine = LevyEngine()
# First call - hits the "LLM"
result1 = engine.generate("Hello world")
print(result1.source) # 'llm'
# Second call - hits the cache
result2 = engine.generate("Hello world")
print(result2.source) # 'exact_cache'A replay script is provided to demonstrate the cache behavior:
python examples/simple_replay.pyIt runs a sequence of prompts through three configurations:
- No Cache
- Exact Cache Only
- Semantic Cache (uses
sentence-transformersif available)
- Install and run Ollama.
- Pull required models:
ollama pull llama3.2 ollama pull mxbai-embed-large
- Run the demo:
python examples/ollama_demo.py
To use Redis for persistence:
- Start Redis:
docker-compose up -d
- Configure
LevyConfigto usecache_store_type="redis".
You can configure Levy via LevyConfig:
config = LevyConfig(
llm_provider="openai",
openai_api_key="sk-...",
enable_semantic_cache=True,
similarity_threshold=0.85
)Apache-2.0