Conversation
eaf9ca6 to
39c35ce
Compare
lnkuiper
left a comment
There was a problem hiding this comment.
Looks good! I have left two minor comments below, and I have a question about testing. Our current HTTP stats (and logs) currently don't show # of handshakes:
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ HTTPFS HTTP Stats ││
││ ││
││ in: 131.2 MiB ││
││ out: 0 bytes ││
││ #HEAD: 1 ││
││ #GET: 9 ││
││ #PUT: 0 ││
││ #POST: 0 ││
││ #DELETE: 0 ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
Would it be possible to measure this and add a test so we can guarantee that this continues to work?
39c35ce to
fea4f45
Compare
|
Thanks, handled the comments, on adding to "HTTPFS HTTP Stats" I am a bit conflicted, since this should be mostly invisible to users, basically it's an optimisation that should just happen, and all other signal are mostly noisy. I could think of adding a log type, that could be handy, but even then unsure if that's required in this PR. I tested with a proxy, that logs whether actual handshake happened or not, and results are as expected, that is first round of connections (about 1 per thread) perform handshake, while next connections will re-use pre-initialized clients. |
79fca9d to
109cfa3
Compare
109cfa3 to
3c1ce96
Compare
Currently
HTTPClientCaches are per FileHandle, after this PR they each FileHandle has ashared_ptr<HTTPClientCache>that allows to share the clients across the same base-url.An LruCache is kept to reachable
HTTPClientCache's, that allows repeated network call, even acrossFileHandle's, to avoid unnecessary network handshakes.This builds on top of LruCache made available in
duckdb/duckdbin duckdb/duckdb#20157 (and generalized in duckdb/duckdb#20757 and the infrastructure to re-initialized clients, recently touched on via #232.Ownership model is fully RAII, thanks to
shared_ptr's, note that this means that there might be still aliveHTTPClientCache, say a very long livedFileHandle, but those are not reachable anymore.This can be improved on, but for now PR to improve current status and add infrastructure.
Cache size is fixed at 256, arguably it should be configurable.
This should somewhat visibly improve performances of repeated operations to same
BaseUrl(), but apart from timing this should not bring observable differences.CI-wise, main goal here is checking whether it's a pass.