Skip to content

Conversation

anjaliratnam-msft
Copy link
Collaborator

Updated the default block size to be 50 MiB so AzureBlobFileSystem and AzureBlobFile are more consistent. This also addresses the github issue where the chunk_size of 1GB results in a Timeout Error and also respects the block_size parameter sent in when opening a file.

I also created this gist to show that the performance is not impacted negatively when increasing the default from 5 to 50 MiB and performance improves when reducing the chunk size from 1 GB to 50 MiB.

@kyleknap
Copy link
Collaborator

@martindurant just giving a heads up... this PR is still a work-in-progress and we'll let you know when it is ready for a full review, but wanted to get your initial thoughts on the direction.

This PR is a result of this conversation that we had here: #508 (comment), and we see as a precursor to this PR: #508 since any from perf gains from concurrency introduced will likely be degraded by the fact that 1GiB payload blocks are being used for concurrent PUTs.

Before we get to far along in this PR, we did want to check with you to make sure you are generally onboard with the changes the PR will be eventually introducing:

  1. Always use the configured blocks size for individual parts uploaded. Specifically, in this case if a large write() occurs (e.g., larger than 5 MB), adlfs would always use 1GiB, which can cause network problems such as @anjaliratnam-msft and the referenced GitHub issue: Data-Upload: Timeout problems and chunk size #494 was reporting.
  2. Update the default block size to be 50 MiB everywhere. Right now the default chunk size is inconsistently set (e.g. it's 4 MiB in the FileSystem and 5 MiB in the file object. This will make it consistent with s3fs: Concurrency in pipe() s3fs#901 and overall we believe this will speed up adlfs (we are still doing testing around this but and will update that gist).
  3. Pass the blocksize from the AzureBlobFileSystem to the file object when fs.open() is called and no block_size override is set. This will make it more consistent with s3fs and ensure if you set a blocksize when instantiating the AzureBlobFileSystem it will be used for all methods and you don't have to reset the block_size again as part of the call to fs.open().

The main downsides (but still seems worth the tradeoff) that I could potentially see are:

  1. Possible bump memory usage, especially when it comes to caching as part of reads
  2. Shrinks the default max blob size that can be written using f.write() in some cases. Specifically, if someone was writing 1 GiB at a time, this will now be broken down into 50 MiB blocks and given that the max number of blocks is 50,000, the theoretical default max blob size is now 2.2 TiB (which is still really large). On the other end by increasing the block size to 50 MiB, it can increase the total blob size if you are doing small (MiB in size) writes at a time since more data will now fit into each block. So this point seems more like a wash.

Furthermore with change 3, it will allow users to more easily set the block size in the case the new defaults do not work for their use case.

Thoughts? Happy to elaborate on any of these more.

Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I really like that we are collecting the perf numbers as well to make sure this change makes sense. Most of my feedback was on structuring the tests. The implementation itself looks solid.

@@ -1879,6 +1878,8 @@ def _open(
is versioning aware and blob versioning is enabled on the releveant container.
"""
logger.debug(f"_open: {path}")
if block_size is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure to update the block_size docstring for this _open() method to:

  1. Include "uploads" in the wording. Look like it only refers downloads
  2. Update the wording to say that the parameter is an override and when not provided defaults to the blocksize on the file system.

adlfs/spec.py Outdated
@@ -2146,7 +2147,7 @@ async def _async_initiate_upload(self, **kwargs):

_initiate_upload = sync_wrapper(_async_initiate_upload)

def _get_chunks(self, data, chunk_size=1024**3): # Keeping the chunk size as 1 GB
def _get_chunks(self, data, chunk_size):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through this more, it probably makes sense to just remove the chunk_size parameter from this helper method all together and just switch the sole reference to chunk_size with the instance property self.blocksize. Mainly, we are not really overriding this value now and the block size is already set in the constructor. So it makes it simpler.

@@ -69,7 +69,7 @@
"is_current_version",
]
_ROOT_PATH = "/"
_DEFAULT_BLOCK_SIZE = 4 * 1024 * 1024
_DEFAULT_BLOCK_SIZE = 50 * 2**20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make sure to update the changelog in this PR. I'm thinking we can just use the three bullet points from this comment I made: #509 (comment) and make the wording a bit more succinct.

assert actual_blocks == expected_blocks


def test_block_size(storage):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth parameterizing this one to test out the different permutations of setting/omitting block sizes. To do this, we parameterize off of the input block size for both the file system and the fs.open() call and include the expected block size for both the file system and file object. We'd then add cases such as (feel free to adjust this or add more):

  1. Assert defaults when block size is not set for either the file system or file object
  2. Assert file system block size propagates to the file-like object
  3. Assert that we can override the block_size for the fs.open() call

fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
blocksize=5 * 2**20,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing cases where we override the block size. It may be worth using values that are unlikely/odd defaults (e.g. 7 * 2 ** 20) to make it more clear this is an override value as opposed to a possible default.

)

content = b"1" * (blocksize * 2 + 1)
with fs.open("data/root/a/file.txt", "wb", blocksize=blocksize) as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove blocksize from this call here as I'm not sure if this will affect anything since the argument used is block_size and it should suffice to set it in the file system.

Comment on lines 2063 to 2068
mocker.patch(
"azure.storage.blob.aio.BlobClient.commit_block_list", autospec=True
)
with patch(
"azure.storage.blob.aio.BlobClient.stage_block", autospec=True
) as mock_stage_block:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these patch statements, is there a particular reason why:

  1. mocker.patch is not being used for both?
  2. We are not following the pattern from other test cases where we import the BlobClient first e.g., here and patching it directly?

f.write(content)
expected_blocks = math.ceil(len(content) / blocksize)
actual_blocks = mock_stage_block.call_count
assert actual_blocks == expected_blocks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are patching the commit block list call as well, it may be worth to also assert the number of blocks in that call also match the expected number of blocks.

@@ -2045,3 +2047,37 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir):
with fs.open("data/afile", "xb") as f:
pass
assert fs.cat_file("data/afile") == b"data"


@pytest.mark.parametrize("blocksize", [5 * 2**20, 50 * 2**20, 100 * 2**20])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be fine just making this a non-parameterized test. Not sure if there is much value in setting different block sizes. I'd say we just set the blocksize to somewhere in single MiB's and write() data that requires several blocks and maybe also make sure the last block does not completely fit completely into a block (e.g., is less than blocksize) to make sure the logic handles sizes that do not fit on block boundaries.

This will hopefully help simplify the scaffolding for the case and also take less time to run.

) as mock_stage_block:
f.write(content)
expected_blocks = math.ceil(len(content) / blocksize)
actual_blocks = mock_stage_block.call_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also probably be worth asserting the actual data sizes used in each stage_block call.

Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just had some really small comments on the test and should be set from a code change perspective.

CHANGELOG.md Outdated
@@ -7,6 +7,9 @@ Unreleased
- Fix issue where ``AzureBlobFile`` did not respect ``location_mode`` parameter
from parent ``AzureBlobFileSystem`` when using SAS credentials and connecting to
new SDK clients.
- The block size is now used for uploads. Previously, it was always 1 GiB irrespective of the block size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's slightly tweak this phrasing to:

- The block size is now used for partitioned uploads. Previously, 1 GiB was used for each uploaded block irrespective of the block size

Mainly technically was used previous in uploads but it was really only used for when the in-memory buffer was flushed.

CHANGELOG.md Outdated
@@ -7,6 +7,9 @@ Unreleased
- Fix issue where ``AzureBlobFile`` did not respect ``location_mode`` parameter
from parent ``AzureBlobFileSystem`` when using SAS credentials and connecting to
new SDK clients.
- The block size is now used for uploads. Previously, it was always 1 GiB irrespective of the block size
- Updated default block size to be 50 MiB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a sentence to this point to say how to revert back to the previous default. So something like:

Set `blocksize` for `AzureBlobFileSystem` or `block_size` when opening an `AzureBlobFile` to revert back to 5 MiB default.

Mainly, this will help anyone trying to understand how to go back to the previous default find it more easily than needing to dig through this PR.

"filesystem_blocksize, file_blocksize, expected_blocksize",
[
(None, None, 50 * 2**20),
(50 * 2**20, None, 50 * 2**20),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, it might make sense to make it a different value like that 7 * 2 ** 20 value to disambiguate from the default value.

(None, None, 50 * 2**20),
(50 * 2**20, None, 50 * 2**20),
(None, 5 * 2**20, 5 * 2**20),
(50 * 2**20, 7 * 2**20, 7 * 2**20),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, instead of using 50 * 2 ** 20 let's use a non-default value like 40 * 2 ** 20 to disambiguate from any defaults.



@pytest.mark.parametrize(
"filesystem_blocksize, file_blocksize, expected_blocksize",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also probably worth adding an expected_filesystem_blocksize to make the blocksize are expected at the filesystem level as well in this test case.

block_size=file_blocksize,
)
assert f.blocksize == expected_blocksize
assert fs.blocksize == 50 * 2**20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this assertion assuming we add the expected_filesystem_blocksize parameter in the other parameterized test case.

assert fs.blocksize == 50 * 2**20


def test_override_blocksize(storage):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so for this test case I was thinking, we'd:

  1. Set the blocksize on the AzureBlobFileSystem
  2. Assert that the AzureBlobFile without setting block_size does not inherit the the blocksize from 1. So after instantiation, we would just assert it is 50 * 2 ** 20, which should be its default.
  3. Update the test name to better reflect that we are testing this code path does not inherit from the file system.

@@ -2045,3 +2046,82 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir):
with fs.open("data/afile", "xb") as f:
pass
assert fs.cat_file("data/afile") == b"data"


def test_number_of_blocks(storage, mocker):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this test name to be test_uses_block_size_for_partitioned_uploads in order to be a bit more self-descriptive of what we are trying to test.

Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just had one more comment on the test. Otherwise, let's get the perf testing analysis in a spot that we are happy with and should be set.

(50 * 2**20, None, 50 * 2**20),
(None, 5 * 2**20, 5 * 2**20),
(50 * 2**20, 7 * 2**20, 7 * 2**20),
(None, None, 50 * 2**20, None),
Copy link
Collaborator

@kyleknap kyleknap Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of directly passing None, let's just treat it as omitting block_size/blocksize in the consturctor/open call. Mainly looking at the expected_filesystem_blocksize, I'm not sure that asserting that expected filesystem blocksize is None is what we want when we are mainly checking that it falls back to the default block size of 50 * 2 * 20.

Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anjaliratnam-msft Looks good. I think we are in a good spot code wise so let's just wrap up the perf testing work and analysis to make sure we are confident in the changes.

@martindurant as a heads up we may be tweaking this plan here: #509 (comment). Specifically, for point 2, we are reconsidering whether we keep the default blocksize at 5 MiB instead of changing it to 50 MiB. We've been finding data that shows that when concurrency is introduced to writes, using 5 MiB blocks is significantly faster than 50 MiB blocks. We'll let you know when the perf results are in a good spot to review.

@martindurant
Copy link
Member

We've been finding data that shows that when concurrency is introduced to writes, using 5 MiB blocks is significantly faster than 50 MiB blocks.

I'm sure it depends on how many blocks there are in total! So one 50MB block would be slower than 10x5MB, but if you have enough 50MB blocks to saturate too, I would expect it to be faster by reducing overhead.

@kyleknap
Copy link
Collaborator

kyleknap commented Aug 18, 2025

@martindurant agreed it is definitely dependent on the number of blocks. One of the updates we are making to the perf testing is increasing the size of the blob being transferred to something fairly large (27 GiB) so that both the 5MiB and 50MiB block sizes reach concurrency saturation. Even at 5 GiB blob size, the 50 MiB block size was not necessarily saturating full concurrency so that may have been skewing result interpretation.

Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for putting together these performance results. This is really helpful in making decisions on this.

I think it is awesome to see that with this PR alone, we will be reducing timeout/network errors by sending smaller upload payloads (50 MiB instead of possibly 1 GiB) and we will greatly be increasing the speed for small sequential writes (e.g. writing 1KB at a time scenario) as we will be reducing the number of PUTs being made.

Furthermore, it is great to see when we then we add on this PR that adds concurrency to partitioned uploads, it improves the speed of large writes (e.g., writing a large tensor in a PyTorch model) by 2x - 3x by default and can reach up to 5x improvement if you further tune blocksize and concurrency based on your workload. With both of these PRs, it really puts write performance on par with the read performance of adlfs now.

@martindurant this all should be ready to review. @anjaliratnam-msft includes the rationale in the perf results gist but we decided to stick with the approach of 50 MiB block size. We did continue to see when concurrency is enabled and saturated, 5 MiB block sizes squeezed out more speed than the 50 MiB sizes. However, the tradeoff in further limiting default blob sizes and performance degradation when concurrency is disabled outweighed these possible gains.

Let us know what you think!

@martindurant
Copy link
Member

I am happy here.

@martindurant martindurant merged commit b5b1c33 into fsspec:main Aug 19, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants