[C++] S3 read request patterns are very inefficient

### Describe the enhancement requested

I've been looking at why Arrow's access of parquet files on an S3 store are slower when compared to Polars and ClickHouse.  A packet capture highlighted what the problem is. For a single parquet file read the following S3 requests are made:

- HEAD 
- HEAD
- HEAD
- Oversized ranged GET of the tail of the object to read the metadata block
- HEAD
- Ranged GETs to read the object data

If there's any significant latency between the Arrow client and the S3 (which is likely), all these requests translate into a performance bottleneck.  I'm using MinIO and there's a very noticable difference in overall read performance for a client that's 1ms away from the server and one that 30 ms away. That's 150 ms before any data is transfered when it could be 60 ms. The impact gets worse the further apart the client and server are with AWS S3 and GCS likely being the worst cases.

Compare to what Polars does to read the same parquet file:

- HEAD
- 8 byte read at end to get metadata size
- Precise tail read to get metadata
- Ranged GETs from the start to read the table metadata

Arguably it could be even smarter to just read the last 64KB and save a request instead of doing an exact read of the metadata.

ClickHouse is smart when it comes to smaller objects, doing a HEAD and just grabbing the whole object in one go if the size is below some threshold. For larger objects, it does what Arrow does with too many HEAD requests (one less than Arrow).

I've tried `allow_delayed_open` but this seems to make no difference to S3 read requests despite the documentation hinting that it might. `allow_delayed_open` does help with the efficiency of writing smaller objects though. 

Are there any plans to improve the efficient of Arrow's S3 reads? 

### Component(s)

C++

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++] S3 read request patterns are very inefficient #47239

Describe the enhancement requested

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] S3 read request patterns are very inefficient #47239

Description

Describe the enhancement requested

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions