An LLM-native music recommendation system that uses tool calling to orchestrate a unified retrieval → reranking pipeline over SQL, BM25, embeddings (text/audio/image/CF), and semantic IDs.
- Agentic pipeline: LLM plans tool calls, executes retrieval, and generates a grounded response.
- Multi-tool retrieval: SQL filtering, BM25 lexical search, text/audio/image/CF embeddings, semantic-ID matching.
- Personalization: Warm/cold-start aware strategies with user-item similarity when applicable.
- Repro-friendly: Lightweight test indices for quick demos; cache-first design for tools and models.
- Python 3.11
- Linux, macOS, or WSL; GPU recommended for embedding/LLM models (CPU works but slower)
python -m pip install uv
uv venv .venv --python 3.11
source .venv/bin/activate
uv add torch torchvision torchaudio
uv pip install laion_clap
uv pip install -e .Prebuilt demo indices are expected under ./cache. You can download a prepared bundle and extract it:
wget https://huggingface.co/datasets/talkpl-ai/TalkPlayTools-Env/resolve/main/tool_env.tar.gz
tar -xzvf tool_env.tar.gz -C ./cacheExpected subdirectories (after extraction):
cache/metadata(test metadata files)cache/bm25(BM25 indices andtrack_index.json)cache/encoder(vector DB for embeddings)cache/semantic_id(RVQ indices per modality)cache/sql(SQLite DB for tracks)
Run an example query with the provided test data (cold-start user case). This demo uses only 6,744 tracks from the test split of TalkPlayData-2. Due to licensing constraints, the system returns Spotify links instead of direct audio files.
python run.py --user_query "I'm looking for calm and slow tempo piano music."Example output:
----------------------------------------------------------------------------------------------------
🎵 Music: https://open.spotify.com/track/00CXUMREit80f2McJsjcIz
🤖 Assistant Response:
I’ve found a perfect match for your request!
**"Lieder ohne Worte (Songs without Words), Book 2, Op. 30: No. 7 in E-flat major"**
by **Felix Mendelssohn**, performed by **Péter Nagy**.
This classical piano piece features a **slow tempo (65.79 BPM)**, **mellow melodies**,
and a **romantic, emotional tone** that exudes calm and introspection.
The track’s **F# major key** and **instrumental, melancholic style**
make it ideal for a relaxed, reflective mood. It’s a beautifully
crafted piece that aligns perfectly with your request for calm and slow piano music.
Would you like to explore similar tracks, or need recommendations for different moods?
I’m here to help! 🎹
----------------------------------------------------------------------------------------------------
More detail results (Chain of Thought / Tool Calling / Response) are saved in ./demo/staticWe've implemented an interactive Gradio web interface for multi-turn conversations with the TalkPlay agent. Note that multi-turn conversation capabilities will be updated in future releases.
To launch the demo interface:
python app.py
- Default LLM: Qwen3-4B (you can customize in
tpa/agents/__init__.pyor via flags if you extendrun.py). - Tools and models read from
./cacheby default; set a different path by changing the constructor args when building the agent.
tpa/
agents/ # Agent, LLM wrapper, prompts
environments/ # Tool executor, tools, DBs, preprocessing
evaluation/ # Offline metrics and examples
run.py # CLI demo entry point
app.py # Gardio App for demo
- Demo/test data:
TalkPlayData-2on Hugging Face
This project is released under the CC-BY-NC 4.0 license.
If this project helps your research, please consider citing our work.
% Coming soon
