Skip to content

LLM-powered agent that maps jazz musicians into a social-network graph, with collaborations, SNA metrics, and interactive visualizations.

License

Notifications You must be signed in to change notification settings

yamtimor/jazz-graph-agent

Repository files navigation

jazz-graph-agent

An end-to-end LangGraph agent that fetches jazz musician data, extracts collaboration relationships using structured LLM output, and generates an interactive social network visualization with SNA metrics.

Focus: Bebop era (1940-1960), but the architecture supports any jazz domain.


What This Project Does

jazz-graph-agent demonstrates a complete AI agent pipeline:

  1. Fetches jazz musician data from MusicBrainz API (with mock fallback)
  2. Extracts collaboration networks using LLM with structured output (Pydantic schemas)
  3. Builds a NetworkX graph from the extracted relationships
  4. Computes Social Network Analysis metrics (degree centrality, betweenness, clustering)
  5. Visualizes the network as an interactive HTML graph (PyVis)

Architecture

Phase 1: LangGraph Agent (LLM-Driven)

A LangGraph state machine orchestrates two sequential nodes:

Node 1: fetch_data_node

  • Calls fetch_jazz_data(musician_name) tool
  • Attempts to fetch from MusicBrainz API
  • Falls back to mock data for development/demo purposes
  • Returns raw text with collaboration information

Node 2: parse_data_node

  • Uses LLM with structured output (Pydantic models)
  • Parses raw text into a validated JazzNetworkGraph schema:
    {
      "nodes": [{"id": "musician_name", "instrument": "...", "role": "..."}],
      "edges": [{"source": "...", "target": "...", "collaboration_type": "...", "weight": 1}]
    }
  • Filters collaborations by era (start/end years)
  • Returns validated, structured graph data

Key Technology: LangGraph provides clear state management and sequential workflow control, making the agent logic explicit and debuggable.


Phase 2: Pipeline (Deterministic Python)

Once structured data is extracted, pure Python processes it:

pipeline/graph_builder.py

  • Converts JSON to NetworkX graph
  • Validates nodes and edges
  • Adds musician attributes (instrument, role)

pipeline/metrics.py

  • Computes degree centrality (connection count)
  • Computes betweenness centrality (bridging power)
  • Computes clustering coefficient (local connectivity)

pipeline/visualize.py

  • Generates interactive PyVis HTML
  • Node size = degree centrality
  • Hover tooltips show metrics
  • Dark theme for readability

Project Structure

jazz-graph-agent/
├── main.py                    # Entry point, orchestrates full pipeline
├── config.py                  # Configuration (musician, era, paths, LLM settings)
├── .env                       # Environment variables (OPENAI_API_KEY)
│
├── agent/
│   ├── agent.py              # LangGraph state machine and workflow
│   ├── tools.py              # fetch_jazz_data tool (MusicBrainz API + fallback)
│   ├── prompts.py            # System prompt and Pydantic schemas
│   └── model.py              # Multi-provider LLM factory (OpenAI + HuggingFace)
│
├── pipeline/
│   ├── graph_builder.py      # JSON → NetworkX graph
│   ├── metrics.py            # SNA metric computation
│   └── visualize.py          # PyVis HTML generation
│
└── data/output/
    └── jazz_graph.html       # Final interactive visualization

Setup & Installation

Prerequisites

  • Python 3.10+
  • API key for your chosen LLM provider:
    • OpenAI API key (for GPT models), OR
    • HuggingFace API key (for Llama, Mixtral, and other open models)

1. Clone the repository

git clone <your-repo-url>
cd jazz-graph-agent

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Create a .env file in the project root. See docs/ENV_TEMPLATE.md for full details.

For OpenAI (default):

LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx

For HuggingFace:

LLM_PROVIDER=huggingface
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxx

Get your API keys:

5. Run the pipeline

python main.py

Output:

  • Console logs showing progress
  • data/output/jazz_graph.html — open in browser to explore the network

Configuration

Edit config.py or use environment variables to customize:

@dataclass(frozen=True)
class JazzGraphConfig:
    # Choose your musician and era
    seed_musician: str = "Charlie Parker"
    era_start_year: int = 1940
    era_end_year: int = 1960
    
    # LLM settings
    llm_provider: str = "openai"  # or "huggingface"
    llm_model: str = "gpt-4o-mini"
    max_tokens: int = 3000

Supported LLM Providers

Provider Models Pros Cons
OpenAI gpt-4o-mini, gpt-4o, gpt-4-turbo Excellent reliability, fast, great structured output Requires paid API key
HuggingFace Llama-3.1-70B, Mixtral-8x7B, etc. Open models, flexible, good quality May be slower, requires experimentation

Recommended Models

OpenAI:

  • gpt-4o-mini - Fast, cost-effective (default)
  • gpt-4o - Best quality for complex networks
  • gpt-4-turbo - Good balance

HuggingFace:

  • meta-llama/Llama-3.1-70B-Instruct - Best quality
  • mistralai/Mixtral-8x7B-Instruct-v0.1 - Good balance
  • meta-llama/Llama-3.2-11B-Vision-Instruct - Lighter weight

How It Works (Step-by-Step)

  1. Agent Initialization

    • LangGraph builds a state machine with fetch_dataparse_data flow
    • Initial state includes musician name and era bounds
  2. Data Fetching

    • Queries MusicBrainz API for artist relationships
    • Formats results as plain text
    • Falls back to mock data if API fails (for development)
  3. LLM Parsing

    • Sends raw text + system prompt to LLM
    • Uses structured output with Pydantic validation
    • LLM extracts nodes (musicians) and edges (collaborations)
    • Filters by era, validates schema
  4. Graph Construction

    • Converts validated JSON to NetworkX graph
    • Adds node attributes (instrument, role)
    • Adds edge attributes (collaboration type, weight)
  5. Metrics Computation

    • Calculates centrality measures
    • Identifies key connectors in the network
    • Analyzes clustering patterns
  6. Visualization

    • Generates interactive HTML with PyVis
    • Node size reflects importance (degree centrality)
    • Hover to see detailed metrics
    • Physics simulation for natural layout

Data Source

Primary: MusicBrainz API — Open music encyclopedia with rich relationship data

Fallback: Mock data for development/demonstration

The project is designed to easily swap data sources by modifying agent/tools.py.


Key Learning Outcomes

This project demonstrates:

LangGraph state machines for agent workflow control
Structured LLM output with Pydantic validation (using Instructor)
Multi-provider LLM support (OpenAI + HuggingFace)
Tool integration (API calls within agent context)
LLM + deterministic code separation (hybrid architecture)
Error handling throughout the pipeline
NetworkX for graph manipulation
PyVis for interactive visualization


Example Output

For Charlie Parker (1940-1960):

  • ~8-12 nodes (key bebop musicians)
  • ~15-20 edges (collaboration relationships)
  • Centrality highlights: Parker, Dizzy Gillespie, Miles Davis, Max Roach

Open data/output/jazz_graph.html in your browser to explore!


Limitations & Future Work

Current Limitations:

  • MusicBrainz API may have incomplete data for historical jazz musicians
  • Mock fallback data is used for reliability during development
  • Network size limited by API rate limits and LLM context window

Future Enhancements:

  • Add web scraping for JazzDisco.org (more complete historical data)
  • Implement caching to avoid redundant API calls
  • Add CLI arguments for custom musicians/eras
  • Export metrics to JSON for further analysis
  • Support multiple seed musicians
  • Add time-series analysis (collaboration evolution)
  • Add support for local LLM models (via Ollama)
  • Expand to more LLM providers (Anthropic Claude, Google Gemini)

Contributing

Contributions are welcome! Whether you're fixing bugs, adding features, improving documentation, or sharing jazz knowledge — we'd love your help.

Please read our CONTRIBUTING.md for:

  • Development setup
  • Code style guidelines
  • Pull request process
  • Issue labels and workflow

License

MIT License — feel free to use for learning and experimentation.

About

LLM-powered agent that maps jazz musicians into a social-network graph, with collaborations, SNA metrics, and interactive visualizations.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages