Levy

Levy is a semantic caching engine for LLM APIs, designed as a research prototype for a Computer Science Capstone project. It sits between your application and an LLM provider (like OpenAI) to optimize costs and latency by reusing responses for identical or semantically similar prompts.

Features

Exact Match Caching: Extremely fast retrieval for identical prompts.
Semantic Caching: Uses vector embeddings (via sentence-transformers) to find and reuse answers for similar meaning queries (e.g., "What is the capital of France?" vs "Tell me France's capital").
Metrics: Automatically tracks cache hit rates, latency and estimated token savings.
Pluggable Architecture: Easy to swap LLM providers or Vector Stores.

Project Structure

levy/
├── levy/               # Core package
│   ├── cache/          # Cache logic (Exact, Semantic, Store)
│   ├── llm_client.py   # LLM interaction (Mock, OpenAI)
│   ├── embeddings.py   # Vector embedding logic
│   ├── engine.py       # Main orchestration engine
│   └── models.py       # Data classes
├── examples/           # Demo scripts
└── tests/              # Unit tests

Installation

Using Conda (Recommended)

Ensure you have Conda installed.
Create the environment:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate levy
```

Usage

Quick Start (Python)

from levy import LevyEngine, LevyConfig

# Initialize with defaults (Mock LLM, Exact Cache only)
engine = LevyEngine()

# First call - hits the "LLM"
result1 = engine.generate("Hello world")
print(result1.source) # 'llm'

# Second call - hits the cache
result2 = engine.generate("Hello world")
print(result2.source) # 'exact_cache'

Running the Experiment Script

A replay script is provided to demonstrate the cache behavior:

python examples/simple_replay.py

It runs a sequence of prompts through three configurations:

No Cache
Exact Cache Only
Semantic Cache (uses sentence-transformers if available)

Running with Ollama (Local Models)

Install and run Ollama.

Pull required models:

ollama pull llama3.2
ollama pull mxbai-embed-large

Run the demo:
```
python examples/ollama_demo.py
```

Using Redis Stack (Docker)

To use Redis for persistence:

Start Redis:
```
docker-compose up -d
```
Configure LevyConfig to use cache_store_type="redis".

Configuration

You can configure Levy via LevyConfig:

config = LevyConfig(
    llm_provider="openai",
    openai_api_key="sk-...",
    enable_semantic_cache=True,
    similarity_threshold=0.85
)

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
examples		examples
levy		levy
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Levy

Features

Project Structure

Installation

Using Conda (Recommended)

Usage

Quick Start (Python)

Running the Experiment Script

Running with Ollama (Local Models)

Using Redis Stack (Docker)

Configuration

License

About

Uh oh!

Releases

Packages

Languages

License

AlejoJamC/levy

Folders and files

Latest commit

History

Repository files navigation

Levy

Features

Project Structure

Installation

Using Conda (Recommended)

Usage

Quick Start (Python)

Running the Experiment Script

Running with Ollama (Local Models)

Using Redis Stack (Docker)

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages