Skip to content

Conversation

@graelo
Copy link

@graelo graelo commented Aug 9, 2025

Add Mistral Tokenizer Support for Tool Calling

Summary

This PR adds basic support for Mistral tokenizers in MLX-LM, with a focus on tool calling functionality. The implementation includes optional mistral-common integration and practical examples to help users work with Mistral models for function calling.

Changes Made

Core Tokenizer Integration

  • Optional Mistral support: Added mistral-common as an optional dependency
  • Enhanced TokenizerWrapper: Extended to handle both HuggingFace and Mistral tokenizers
  • Chat template method: Added apply_chat_template() with automatic OpenAI-to-Mistral format conversion
  • Streaming detokenizer: New MistralStreamingDetokenizer for proper special token handling

Examples and Documentation

  • mistral_tool_use.py: Multi-turn tool calling example with weather and math functions
  • mistral_parallel_tool_use.py: Example showing parallel tool calls

Key Features

  • Format conversion: Automatically converts OpenAI-style messages to Mistral format
  • Tool calling: Supports Mistral's [TOOL_CALLS] format with robust parsing
  • Graceful fallbacks: Works with or without mistral-common installed
  • OpenAI compatibility: Uses standard OpenAI message format as input
  • Quantization: Saves and uploads to HF the original tekken.json file along with the model files

Usage Example

from mlx_lm import load, generate

model, tokenizer = load("graelo/Devstral-Small-2507-4bits")

# Standard OpenAI format
messages = [{"role": "user", "content": "What's the weather in Paris?"}]
tools = [{"type": "function", "function": {...}}]

prompt = tokenizer.apply_chat_template(messages, tools=tools)
response = generate(model, tokenizer, prompt, max_tokens=100)

Dependencies

Added optional mistral extra in setup.py:

pip install mlx-lm[mistral]  # Includes mistral-common

Files Modified

  • mlx_lm/tokenizer_utils.py - Core tokenizer wrapper enhancements
  • mlx_lm/examples/mistral_*.py - New tool calling examples
  • TOOL_CALLING_TUTORIAL.md - Tutorial documentation
  • setup.py - Optional mistral dependency

This is a first step toward better Mistral integration in MLX-LM. The implementation is straightforward but should provide a good foundation for users wanting to experiment with Mistral tool calling.

@graelo graelo marked this pull request as draft August 10, 2025 22:51
@graelo
Copy link
Author

graelo commented Aug 10, 2025

I'm putting this on hold as I now understand the tool results are not properly encoded using ToolMessage, I had missed that. I'll reopen the PR hopefully soon.

@graelo
Copy link
Author

graelo commented Aug 11, 2025

I find the Mistral addition to work really well, but it clutters the code in tokenizer_utils.py: I'll try factor these parts in a separate file. Once ready, I'll put the PR as ready for review.

@graelo graelo marked this pull request as ready for review August 11, 2025 12:41
@graelo
Copy link
Author

graelo commented Aug 20, 2025

Rebased on main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant