An end-to-end LangGraph agent that fetches jazz musician data, extracts collaboration relationships using structured LLM output, and generates an interactive social network visualization with SNA metrics.
Focus: Bebop era (1940-1960), but the architecture supports any jazz domain.
jazz-graph-agent demonstrates a complete AI agent pipeline:
- Fetches jazz musician data from MusicBrainz API (with mock fallback)
- Extracts collaboration networks using LLM with structured output (Pydantic schemas)
- Builds a NetworkX graph from the extracted relationships
- Computes Social Network Analysis metrics (degree centrality, betweenness, clustering)
- Visualizes the network as an interactive HTML graph (PyVis)
A LangGraph state machine orchestrates two sequential nodes:
- Calls
fetch_jazz_data(musician_name)tool - Attempts to fetch from MusicBrainz API
- Falls back to mock data for development/demo purposes
- Returns raw text with collaboration information
- Uses LLM with structured output (Pydantic models)
- Parses raw text into a validated
JazzNetworkGraphschema:{ "nodes": [{"id": "musician_name", "instrument": "...", "role": "..."}], "edges": [{"source": "...", "target": "...", "collaboration_type": "...", "weight": 1}] } - Filters collaborations by era (start/end years)
- Returns validated, structured graph data
Key Technology: LangGraph provides clear state management and sequential workflow control, making the agent logic explicit and debuggable.
Once structured data is extracted, pure Python processes it:
pipeline/graph_builder.py
- Converts JSON to NetworkX graph
- Validates nodes and edges
- Adds musician attributes (instrument, role)
pipeline/metrics.py
- Computes degree centrality (connection count)
- Computes betweenness centrality (bridging power)
- Computes clustering coefficient (local connectivity)
pipeline/visualize.py
- Generates interactive PyVis HTML
- Node size = degree centrality
- Hover tooltips show metrics
- Dark theme for readability
jazz-graph-agent/
├── main.py # Entry point, orchestrates full pipeline
├── config.py # Configuration (musician, era, paths, LLM settings)
├── .env # Environment variables (OPENAI_API_KEY)
│
├── agent/
│ ├── agent.py # LangGraph state machine and workflow
│ ├── tools.py # fetch_jazz_data tool (MusicBrainz API + fallback)
│ ├── prompts.py # System prompt and Pydantic schemas
│ └── model.py # Multi-provider LLM factory (OpenAI + HuggingFace)
│
├── pipeline/
│ ├── graph_builder.py # JSON → NetworkX graph
│ ├── metrics.py # SNA metric computation
│ └── visualize.py # PyVis HTML generation
│
└── data/output/
└── jazz_graph.html # Final interactive visualization
- Python 3.10+
- API key for your chosen LLM provider:
- OpenAI API key (for GPT models), OR
- HuggingFace API key (for Llama, Mixtral, and other open models)
git clone <your-repo-url>
cd jazz-graph-agentpython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the project root. See docs/ENV_TEMPLATE.md for full details.
For OpenAI (default):
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxFor HuggingFace:
LLM_PROVIDER=huggingface
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxxGet your API keys:
- OpenAI: https://platform.openai.com/api-keys
- HuggingFace: https://huggingface.co/settings/tokens
python main.pyOutput:
- Console logs showing progress
data/output/jazz_graph.html— open in browser to explore the network
Edit config.py or use environment variables to customize:
@dataclass(frozen=True)
class JazzGraphConfig:
# Choose your musician and era
seed_musician: str = "Charlie Parker"
era_start_year: int = 1940
era_end_year: int = 1960
# LLM settings
llm_provider: str = "openai" # or "huggingface"
llm_model: str = "gpt-4o-mini"
max_tokens: int = 3000| Provider | Models | Pros | Cons |
|---|---|---|---|
| OpenAI | gpt-4o-mini, gpt-4o, gpt-4-turbo | Excellent reliability, fast, great structured output | Requires paid API key |
| HuggingFace | Llama-3.1-70B, Mixtral-8x7B, etc. | Open models, flexible, good quality | May be slower, requires experimentation |
OpenAI:
gpt-4o-mini- Fast, cost-effective (default)gpt-4o- Best quality for complex networksgpt-4-turbo- Good balance
HuggingFace:
meta-llama/Llama-3.1-70B-Instruct- Best qualitymistralai/Mixtral-8x7B-Instruct-v0.1- Good balancemeta-llama/Llama-3.2-11B-Vision-Instruct- Lighter weight
-
Agent Initialization
- LangGraph builds a state machine with
fetch_data→parse_dataflow - Initial state includes musician name and era bounds
- LangGraph builds a state machine with
-
Data Fetching
- Queries MusicBrainz API for artist relationships
- Formats results as plain text
- Falls back to mock data if API fails (for development)
-
LLM Parsing
- Sends raw text + system prompt to LLM
- Uses structured output with Pydantic validation
- LLM extracts nodes (musicians) and edges (collaborations)
- Filters by era, validates schema
-
Graph Construction
- Converts validated JSON to NetworkX graph
- Adds node attributes (instrument, role)
- Adds edge attributes (collaboration type, weight)
-
Metrics Computation
- Calculates centrality measures
- Identifies key connectors in the network
- Analyzes clustering patterns
-
Visualization
- Generates interactive HTML with PyVis
- Node size reflects importance (degree centrality)
- Hover to see detailed metrics
- Physics simulation for natural layout
Primary: MusicBrainz API — Open music encyclopedia with rich relationship data
Fallback: Mock data for development/demonstration
The project is designed to easily swap data sources by modifying agent/tools.py.
This project demonstrates:
✅ LangGraph state machines for agent workflow control
✅ Structured LLM output with Pydantic validation (using Instructor)
✅ Multi-provider LLM support (OpenAI + HuggingFace)
✅ Tool integration (API calls within agent context)
✅ LLM + deterministic code separation (hybrid architecture)
✅ Error handling throughout the pipeline
✅ NetworkX for graph manipulation
✅ PyVis for interactive visualization
For Charlie Parker (1940-1960):
- ~8-12 nodes (key bebop musicians)
- ~15-20 edges (collaboration relationships)
- Centrality highlights: Parker, Dizzy Gillespie, Miles Davis, Max Roach
Open data/output/jazz_graph.html in your browser to explore!
Current Limitations:
- MusicBrainz API may have incomplete data for historical jazz musicians
- Mock fallback data is used for reliability during development
- Network size limited by API rate limits and LLM context window
Future Enhancements:
- Add web scraping for JazzDisco.org (more complete historical data)
- Implement caching to avoid redundant API calls
- Add CLI arguments for custom musicians/eras
- Export metrics to JSON for further analysis
- Support multiple seed musicians
- Add time-series analysis (collaboration evolution)
- Add support for local LLM models (via Ollama)
- Expand to more LLM providers (Anthropic Claude, Google Gemini)
Contributions are welcome! Whether you're fixing bugs, adding features, improving documentation, or sharing jazz knowledge — we'd love your help.
Please read our CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- Pull request process
- Issue labels and workflow
MIT License — feel free to use for learning and experimentation.