Skip to content

HTTPClient caching across FileHandles#239

Draft
carlopi wants to merge 15 commits intoduckdb:mainfrom
carlopi:get_or_create_client_cache
Draft

HTTPClient caching across FileHandles#239
carlopi wants to merge 15 commits intoduckdb:mainfrom
carlopi:get_or_create_client_cache

Conversation

@carlopi
Copy link
Collaborator

@carlopi carlopi commented Feb 2, 2026

Currently HTTPClientCaches are per FileHandle, after this PR they each FileHandle has a shared_ptr<HTTPClientCache> that allows to share the clients across the same base-url.

An LruCache is kept to reachable HTTPClientCache's, that allows repeated network call, even across FileHandle's, to avoid unnecessary network handshakes.

This builds on top of LruCache made available in duckdb/duckdb in duckdb/duckdb#20157 (and generalized in duckdb/duckdb#20757 and the infrastructure to re-initialized clients, recently touched on via #232.

Ownership model is fully RAII, thanks to shared_ptr's, note that this means that there might be still alive HTTPClientCache, say a very long lived FileHandle, but those are not reachable anymore.
This can be improved on, but for now PR to improve current status and add infrastructure.
Cache size is fixed at 256, arguably it should be configurable.

This should somewhat visibly improve performances of repeated operations to same BaseUrl(), but apart from timing this should not bring observable differences.
CI-wise, main goal here is checking whether it's a pass.

@carlopi carlopi requested a review from lnkuiper February 2, 2026 15:59
@carlopi carlopi force-pushed the get_or_create_client_cache branch 2 times, most recently from eaf9ca6 to 39c35ce Compare February 2, 2026 16:39
@carlopi carlopi marked this pull request as draft February 2, 2026 20:33
Copy link
Collaborator

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have left two minor comments below, and I have a question about testing. Our current HTTP stats (and logs) currently don't show # of handshakes:

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         HTTPFS HTTP Stats         ││
││                                   ││
││           in: 131.2 MiB           ││
││            out: 0 bytes           ││
││              #HEAD: 1             ││
││              #GET: 9              ││
││              #PUT: 0              ││
││              #POST: 0             ││
││             #DELETE: 0            ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘

Would it be possible to measure this and add a test so we can guarantee that this continues to work?

@carlopi carlopi force-pushed the get_or_create_client_cache branch from 39c35ce to fea4f45 Compare February 11, 2026 15:14
@carlopi carlopi marked this pull request as ready for review February 11, 2026 15:14
@carlopi
Copy link
Collaborator Author

carlopi commented Feb 11, 2026

Thanks, handled the comments, on adding to "HTTPFS HTTP Stats" I am a bit conflicted, since this should be mostly invisible to users, basically it's an optimisation that should just happen, and all other signal are mostly noisy.

I could think of adding a log type, that could be handy, but even then unsure if that's required in this PR.

I tested with a proxy, that logs whether actual handshake happened or not, and results are as expected, that is first round of connections (about 1 per thread) perform handshake, while next connections will re-use pre-initialized clients.

@carlopi carlopi marked this pull request as draft February 11, 2026 16:46
@carlopi carlopi force-pushed the get_or_create_client_cache branch from 79fca9d to 109cfa3 Compare February 16, 2026 10:26
@carlopi carlopi force-pushed the get_or_create_client_cache branch from 109cfa3 to 3c1ce96 Compare February 17, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants