Skip to content

GH-47349: [C++] Include request ID in AWS S3 Error #47351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wingkitlee0
Copy link

@wingkitlee0 wingkitlee0 commented Aug 17, 2025

Rationale for this change

It is a uuid useful for debugging with AWS support team.

Indeed, minio errors will print out a request ID by default.

What changes are included in this PR?

The request ID is appended to the end of the error message.

Are these changes tested?

No new test added yet.

I will try to test it on a real S3 system. Minio seems to return empty string only.

Are there any user-facing changes?

No

Copy link

⚠️ GitHub issue #47349 has been automatically assigned in GitHub to PR creator.

@wingkitlee0 wingkitlee0 marked this pull request as ready for review August 17, 2025 17:16
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only E2E test I can seem to find that reuses this error message is:

@pytest.mark.s3
def test_s3fs_wrong_region():
from pyarrow.fs import S3FileSystem
# wrong region for bucket
# anonymous=True incase CI/etc has invalid credentials
fs = S3FileSystem(region='eu-north-1', anonymous=True)
msg = ("When getting information for bucket 'voltrondata-labs-datasets': "
r"AWS Error UNKNOWN \(HTTP status 301\) during HeadBucket "
"operation: No response body. Looks like the configured region is "
"'eu-north-1' while the bucket is located in 'us-east-2'."
"|NETWORK_CONNECTION")
with pytest.raises(OSError, match=msg) as exc:
fs.get_file_info("voltrondata-labs-datasets")
# Sometimes fails on unrelated network error, so next call would also fail.
if 'NETWORK_CONNECTION' in str(exc.value):
return
fs = S3FileSystem(region='us-east-2', anonymous=True)
fs.get_file_info("voltrondata-labs-datasets")

Maybe you can reuse it? I haven't tested so I am unsure

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Aug 18, 2025
@wingkitlee0
Copy link
Author

wingkitlee0 commented Aug 18, 2025

@raulcd Thanks for looking at it! It turns out get_file_info has a little bit different extra error handling.

For other APIs, I was able to use minio locally get the expected message:

In [6]: fs = S3FileSystem(endpoint_override="localhost:9000", scheme="http", region="us-east-1", access_key="minioadmin", secret_key="minioadmin")

In [7]: fs.copy_file("test-bucket/a", "test-bucket/b")
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[7], line 1
----> 1 fs.copy_file("test-bucket/a", "test-bucket/b")

File ~/.pyenv/versions/3.12.10/envs/py312-arrow-dev/lib/python3.12/site-packages/pyarrow/_fs.pyx:754, in pyarrow._fs.FileSystem.copy_file()

File ~/.pyenv/versions/3.12.10/envs/py312-arrow-dev/lib/python3.12/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

OSError: When copying key 'a' in bucket 'test-bucket' to key 'b' in bucket 'test-bucket': AWS Error NO_SUCH_BUCKET during CopyObject operation: The specified bucket does not exist request ID: 185CFDB6B0C338B1

So I should be able to add a test for this..

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use minio for several tests, see this fixture:

def s3_server(s3_connection, tmpdir_factory):
@retry(attempts=5, delay=1, backoff=2)
def minio_server_health_check(address):
resp = urllib.request.urlopen(f"http://{address}/minio/health/live")
assert resp.getcode() == 200
tmpdir = tmpdir_factory.getbasetemp()
host, port, access_key, secret_key = s3_connection
address = f'{host}:{port}'
env = os.environ.copy()
env.update({
'MINIO_ACCESS_KEY': access_key,
'MINIO_SECRET_KEY': secret_key
})
args = ['minio', '--compat', 'server', '--quiet', '--address',
address, tmpdir]
proc = None
try:
proc = subprocess.Popen(args, env=env)
except OSError:
pytest.skip('`minio` command cannot be located')
else:
# Wait for the server to startup before yielding
minio_server_health_check(address)
yield {
'connection': s3_connection,
'process': proc,
'tempdir': tmpdir
}
finally:
if proc is not None:
proc.kill()
proc.wait()

Could we replicate your local validation with a test?

@github-actions github-actions bot added Component: Python awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 19, 2025
@wingkitlee0
Copy link
Author

Added some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants