Skip to content

Conversation

@SSOBHY2
Copy link

@SSOBHY2 SSOBHY2 commented Nov 29, 2025

This pull request makes the existing max_concurrent_requests setting for the Mistral integration actually control how many HTTP requests can be in flight at once, both for chat models and for embeddings.

For both ChatMistralAI and MistralAIEmbeddings, the code that builds the HTTPX clients now creates an httpx.Limits object using max_concurrent_requests. That limits object sets both the maximum number of simultaneous connections and the maximum number of keep-alive connections to the value of max_concurrent_requests. These limits are then passed into httpx.Client and httpx.AsyncClient when they are created. In practice, that means even if user code tries to send many requests at once, the underlying HTTP connection pool will only open up to max_concurrent_requests active connections to the Mistral API.

For embeddings, there was an additional issue on the async side: the aembed_documents method uses asyncio.gather over batches, which can easily create many concurrent HTTP calls. To make sure max_concurrent_requests is respected at the application level as well (not only at the connection-pool level), the embeddings class now has a private asyncio.Semaphore attribute. This semaphore is initialized to max_concurrent_requests and is used to wrap each async embedding batch call. When aembed_documents runs, each batch is sent through a helper function that first acquires the semaphore, then performs the HTTP POST to the /embeddings endpoint, and releases the semaphore when it is done. This guarantees that at most max_concurrent_requests embedding batches are being processed concurrently, even if aembed_documents is used in a very concurrent way.

The retry logic for requests (using tenacity) is still in place and works the same as before. The semaphore and HTTPX limits are layered on top of that, so you still get retries for transient errors, but now with real, enforced concurrency limits.

The scope line “Scope: libs/partners/mistralai” means that all code changes and new tests are limited to the Mistral partner package. No other packages in the monorepo are touched. This keeps the change focused and aligns with the guideline that PRs should not affect multiple packages unless necessary.

The “Breaking changes: None” line is important. Public APIs such as class names, constructor parameters, and method signatures are unchanged. The max_concurrent_requests parameter already existed and was documented; the PR just makes it actually control concurrency as advertised. From a user’s perspective, code that worked before still compiles and runs the same way. The only behavioral difference is that concurrency is now correctly bounded to the value they configured, which is the expected and safer behavior.

The tests line “uv run --group test pytest libs/partners/mistralai/tests/unit_tests -q” describes how the new and existing unit tests for this package are run. That command uses the project’s uv-based environment to execute pytest only for the Mistral partner tests. Inside those tests, there are new checks that:

  1. Confirm that httpx.Client and httpx.AsyncClient receive an httpx.Limits instance whose max_connections and max_keepalive_connections match max_concurrent_requests.

  2. Confirm that aembed_documents actually respects the concurrency bound. This is done by patching the async HTTP POST method with a fake function that counts how many requests are “active” at once and ensuring that this number never exceeds the configured limit (for example, max_concurrent_requests set to 1).

Those tests all pass, so we know that both the HTTPX client limits and the semaphore-based concurrency control behave as intended.

AI disclaimer: This contribution was developed with the assistance of an AI coding assistant. I reviewed and double-checked all code and tests myself, and the AI was only used to assist in developing the changes, not to make final decisions.

- Wire max_concurrent_requests into HTTPX client limits for Mistral chat and embeddings
- Bound concurrent aembed_documents calls with a semaphore
- Add unit tests for HTTPX limits and async concurrency
@SSOBHY2 SSOBHY2 requested review from ccurme and mdrxy as code owners November 29, 2025 01:21
@github-actions github-actions bot added fix integration Related to a provider partner package integration mistralai and removed fix labels Nov 29, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 29, 2025

CodSpeed Performance Report

Merging #34142 will not alter performance

Comparing SSOBHY2:changes (8df6aa7) with master (b7091d3)1

Summary

✅ 1 untouched
⏩ 33 skipped2

Footnotes

  1. No successful run was found on master (5a7cf87) during the generation of this report, so b7091d3 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 33 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions github-actions bot added the fix label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix integration Related to a provider partner package integration mistralai

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant