jazz-graph-agent

An end-to-end LangGraph agent that fetches jazz musician data, extracts collaboration relationships using structured LLM output, and generates an interactive social network visualization with SNA metrics.

Focus: Bebop era (1940-1960), but the architecture supports any jazz domain.

What This Project Does

jazz-graph-agent demonstrates a complete AI agent pipeline:

Fetches jazz musician data from MusicBrainz API (with mock fallback)
Extracts collaboration networks using LLM with structured output (Pydantic schemas)
Builds a NetworkX graph from the extracted relationships
Computes Social Network Analysis metrics (degree centrality, betweenness, clustering)
Visualizes the network as an interactive HTML graph (PyVis)

Architecture

Phase 1: LangGraph Agent (LLM-Driven)

A LangGraph state machine orchestrates two sequential nodes:

Node 1: `fetch_data_node`

Calls fetch_jazz_data(musician_name) tool
Attempts to fetch from MusicBrainz API
Falls back to mock data for development/demo purposes
Returns raw text with collaboration information

Node 2: `parse_data_node`

Uses LLM with structured output (Pydantic models)

Parses raw text into a validated JazzNetworkGraph schema:

{
  "nodes": [{"id": "musician_name", "instrument": "...", "role": "..."}],
  "edges": [{"source": "...", "target": "...", "collaboration_type": "...", "weight": 1}]
}

Filters collaborations by era (start/end years)
Returns validated, structured graph data

Key Technology: LangGraph provides clear state management and sequential workflow control, making the agent logic explicit and debuggable.

Phase 2: Pipeline (Deterministic Python)

Once structured data is extracted, pure Python processes it:

pipeline/graph_builder.py

Converts JSON to NetworkX graph
Validates nodes and edges
Adds musician attributes (instrument, role)

pipeline/metrics.py

Computes degree centrality (connection count)
Computes betweenness centrality (bridging power)
Computes clustering coefficient (local connectivity)

pipeline/visualize.py

Generates interactive PyVis HTML
Node size = degree centrality
Hover tooltips show metrics
Dark theme for readability

Project Structure

jazz-graph-agent/
├── main.py                    # Entry point, orchestrates full pipeline
├── config.py                  # Configuration (musician, era, paths, LLM settings)
├── .env                       # Environment variables (OPENAI_API_KEY)
│
├── agent/
│   ├── agent.py              # LangGraph state machine and workflow
│   ├── tools.py              # fetch_jazz_data tool (MusicBrainz API + fallback)
│   ├── prompts.py            # System prompt and Pydantic schemas
│   └── model.py              # Multi-provider LLM factory (OpenAI + HuggingFace)
│
├── pipeline/
│   ├── graph_builder.py      # JSON → NetworkX graph
│   ├── metrics.py            # SNA metric computation
│   └── visualize.py          # PyVis HTML generation
│
└── data/output/
    └── jazz_graph.html       # Final interactive visualization

Setup & Installation

Prerequisites

Python 3.10+
API key for your chosen LLM provider:
- OpenAI API key (for GPT models), OR
- HuggingFace API key (for Llama, Mixtral, and other open models)

1. Clone the repository

git clone <your-repo-url>
cd jazz-graph-agent

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Create a .env file in the project root. See docs/ENV_TEMPLATE.md for full details.

For OpenAI (default):

LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx

For HuggingFace:

LLM_PROVIDER=huggingface
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxx

Get your API keys:

OpenAI: https://platform.openai.com/api-keys
HuggingFace: https://huggingface.co/settings/tokens

5. Run the pipeline

python main.py

Output:

Console logs showing progress
data/output/jazz_graph.html — open in browser to explore the network

Configuration

Edit config.py or use environment variables to customize:

@dataclass(frozen=True)
class JazzGraphConfig:
    # Choose your musician and era
    seed_musician: str = "Charlie Parker"
    era_start_year: int = 1940
    era_end_year: int = 1960
    
    # LLM settings
    llm_provider: str = "openai"  # or "huggingface"
    llm_model: str = "gpt-4o-mini"
    max_tokens: int = 3000

Supported LLM Providers

Provider	Models	Pros	Cons
OpenAI	gpt-4o-mini, gpt-4o, gpt-4-turbo	Excellent reliability, fast, great structured output	Requires paid API key
HuggingFace	Llama-3.1-70B, Mixtral-8x7B, etc.	Open models, flexible, good quality	May be slower, requires experimentation

Recommended Models

OpenAI:

gpt-4o-mini - Fast, cost-effective (default)
gpt-4o - Best quality for complex networks
gpt-4-turbo - Good balance

HuggingFace:

meta-llama/Llama-3.1-70B-Instruct - Best quality
mistralai/Mixtral-8x7B-Instruct-v0.1 - Good balance
meta-llama/Llama-3.2-11B-Vision-Instruct - Lighter weight

How It Works (Step-by-Step)

Agent Initialization
- LangGraph builds a state machine with fetch_data → parse_data flow
- Initial state includes musician name and era bounds
Data Fetching
- Queries MusicBrainz API for artist relationships
- Formats results as plain text
- Falls back to mock data if API fails (for development)
LLM Parsing
- Sends raw text + system prompt to LLM
- Uses structured output with Pydantic validation
- LLM extracts nodes (musicians) and edges (collaborations)
- Filters by era, validates schema
Graph Construction
- Converts validated JSON to NetworkX graph
- Adds node attributes (instrument, role)
- Adds edge attributes (collaboration type, weight)
Metrics Computation
- Calculates centrality measures
- Identifies key connectors in the network
- Analyzes clustering patterns
Visualization
- Generates interactive HTML with PyVis
- Node size reflects importance (degree centrality)
- Hover to see detailed metrics
- Physics simulation for natural layout

Data Source

Primary: MusicBrainz API — Open music encyclopedia with rich relationship data

Fallback: Mock data for development/demonstration

The project is designed to easily swap data sources by modifying agent/tools.py.

Key Learning Outcomes

This project demonstrates:

✅ LangGraph state machines for agent workflow control
✅ Structured LLM output with Pydantic validation (using Instructor)
✅ Multi-provider LLM support (OpenAI + HuggingFace)
✅ Tool integration (API calls within agent context)
✅ LLM + deterministic code separation (hybrid architecture)
✅ Error handling throughout the pipeline
✅ NetworkX for graph manipulation
✅ PyVis for interactive visualization

Example Output

For Charlie Parker (1940-1960):

~8-12 nodes (key bebop musicians)
~15-20 edges (collaboration relationships)
Centrality highlights: Parker, Dizzy Gillespie, Miles Davis, Max Roach

Open data/output/jazz_graph.html in your browser to explore!

Limitations & Future Work

Current Limitations:

MusicBrainz API may have incomplete data for historical jazz musicians
Mock fallback data is used for reliability during development
Network size limited by API rate limits and LLM context window

Future Enhancements:

Add web scraping for JazzDisco.org (more complete historical data)
Implement caching to avoid redundant API calls
Add CLI arguments for custom musicians/eras
Export metrics to JSON for further analysis
Support multiple seed musicians
Add time-series analysis (collaboration evolution)
Add support for local LLM models (via Ollama)
Expand to more LLM providers (Anthropic Claude, Google Gemini)

Contributing

Contributions are welcome! Whether you're fixing bugs, adding features, improving documentation, or sharing jazz knowledge — we'd love your help.

Please read our CONTRIBUTING.md for:

Development setup
Code style guidelines
Pull request process
Issue labels and workflow

License

MIT License — feel free to use for learning and experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
agent		agent
docs		docs
pipeline		pipeline
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

jazz-graph-agent

What This Project Does

Architecture

Phase 1: LangGraph Agent (LLM-Driven)

Node 1: `fetch_data_node`

Node 2: `parse_data_node`

Phase 2: Pipeline (Deterministic Python)

Project Structure

Setup & Installation

Prerequisites

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Configure environment variables

5. Run the pipeline

Configuration

Supported LLM Providers

Recommended Models

How It Works (Step-by-Step)

Data Source

Key Learning Outcomes

Example Output

Limitations & Future Work

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

yamtimor/jazz-graph-agent

Folders and files

Latest commit

History

Repository files navigation

jazz-graph-agent

What This Project Does

Architecture

Phase 1: LangGraph Agent (LLM-Driven)

Node 1: fetch_data_node

Node 2: parse_data_node

Phase 2: Pipeline (Deterministic Python)

Project Structure

Setup & Installation

Prerequisites

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Configure environment variables

5. Run the pipeline

Configuration

Supported LLM Providers

Recommended Models

How It Works (Step-by-Step)

Data Source

Key Learning Outcomes

Example Output

Limitations & Future Work

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Node 1: `fetch_data_node`

Node 2: `parse_data_node`

Packages