-
Notifications
You must be signed in to change notification settings - Fork 107
updated default block size #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@martindurant just giving a heads up... this PR is still a work-in-progress and we'll let you know when it is ready for a full review, but wanted to get your initial thoughts on the direction. This PR is a result of this conversation that we had here: #508 (comment), and we see as a precursor to this PR: #508 since any from perf gains from concurrency introduced will likely be degraded by the fact that 1GiB payload blocks are being used for concurrent PUTs. Before we get to far along in this PR, we did want to check with you to make sure you are generally onboard with the changes the PR will be eventually introducing:
The main downsides (but still seems worth the tradeoff) that I could potentially see are:
Furthermore with change 3, it will allow users to more easily set the block size in the case the new defaults do not work for their use case. Thoughts? Happy to elaborate on any of these more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I really like that we are collecting the perf numbers as well to make sure this change makes sense. Most of my feedback was on structuring the tests. The implementation itself looks solid.
@@ -1879,6 +1878,8 @@ def _open( | |||
is versioning aware and blob versioning is enabled on the releveant container. | |||
""" | |||
logger.debug(f"_open: {path}") | |||
if block_size is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure to update the block_size
docstring for this _open()
method to:
- Include "uploads" in the wording. Look like it only refers downloads
- Update the wording to say that the parameter is an override and when not provided defaults to the
blocksize
on the file system.
adlfs/spec.py
Outdated
@@ -2146,7 +2147,7 @@ async def _async_initiate_upload(self, **kwargs): | |||
|
|||
_initiate_upload = sync_wrapper(_async_initiate_upload) | |||
|
|||
def _get_chunks(self, data, chunk_size=1024**3): # Keeping the chunk size as 1 GB | |||
def _get_chunks(self, data, chunk_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through this more, it probably makes sense to just remove the chunk_size
parameter from this helper method all together and just switch the sole reference to chunk_size with the instance property self.blocksize
. Mainly, we are not really overriding this value now and the block size is already set in the constructor. So it makes it simpler.
@@ -69,7 +69,7 @@ | |||
"is_current_version", | |||
] | |||
_ROOT_PATH = "/" | |||
_DEFAULT_BLOCK_SIZE = 4 * 1024 * 1024 | |||
_DEFAULT_BLOCK_SIZE = 50 * 2**20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make sure to update the changelog in this PR. I'm thinking we can just use the three bullet points from this comment I made: #509 (comment) and make the wording a bit more succinct.
adlfs/tests/test_spec.py
Outdated
assert actual_blocks == expected_blocks | ||
|
||
|
||
def test_block_size(storage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth parameterizing this one to test out the different permutations of setting/omitting block sizes. To do this, we parameterize off of the input block size for both the file system and the fs.open()
call and include the expected block size for both the file system and file object. We'd then add cases such as (feel free to adjust this or add more):
- Assert defaults when block size is not set for either the file system or file object
- Assert file system block size propagates to the file-like object
- Assert that we can override the
block_size
for thefs.open()
call
adlfs/tests/test_spec.py
Outdated
fs = AzureBlobFileSystem( | ||
account_name=storage.account_name, | ||
connection_string=CONN_STR, | ||
blocksize=5 * 2**20, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing cases where we override the block size. It may be worth using values that are unlikely/odd defaults (e.g. 7 * 2 ** 20
) to make it more clear this is an override value as opposed to a possible default.
adlfs/tests/test_spec.py
Outdated
) | ||
|
||
content = b"1" * (blocksize * 2 + 1) | ||
with fs.open("data/root/a/file.txt", "wb", blocksize=blocksize) as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove blocksize
from this call here as I'm not sure if this will affect anything since the argument used is block_size
and it should suffice to set it in the file system.
adlfs/tests/test_spec.py
Outdated
mocker.patch( | ||
"azure.storage.blob.aio.BlobClient.commit_block_list", autospec=True | ||
) | ||
with patch( | ||
"azure.storage.blob.aio.BlobClient.stage_block", autospec=True | ||
) as mock_stage_block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these patch statements, is there a particular reason why:
mocker.patch
is not being used for both?- We are not following the pattern from other test cases where we import the
BlobClient
first e.g., here and patching it directly?
adlfs/tests/test_spec.py
Outdated
f.write(content) | ||
expected_blocks = math.ceil(len(content) / blocksize) | ||
actual_blocks = mock_stage_block.call_count | ||
assert actual_blocks == expected_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are patching the commit block list call as well, it may be worth to also assert the number of blocks in that call also match the expected number of blocks.
adlfs/tests/test_spec.py
Outdated
@@ -2045,3 +2047,37 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir): | |||
with fs.open("data/afile", "xb") as f: | |||
pass | |||
assert fs.cat_file("data/afile") == b"data" | |||
|
|||
|
|||
@pytest.mark.parametrize("blocksize", [5 * 2**20, 50 * 2**20, 100 * 2**20]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be fine just making this a non-parameterized test. Not sure if there is much value in setting different block sizes. I'd say we just set the blocksize to somewhere in single MiB's and write()
data that requires several blocks and maybe also make sure the last block does not completely fit completely into a block (e.g., is less than blocksize) to make sure the logic handles sizes that do not fit on block boundaries.
This will hopefully help simplify the scaffolding for the case and also take less time to run.
adlfs/tests/test_spec.py
Outdated
) as mock_stage_block: | ||
f.write(content) | ||
expected_blocks = math.ceil(len(content) / blocksize) | ||
actual_blocks = mock_stage_block.call_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also probably be worth asserting the actual data sizes used in each stage_block
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just had some really small comments on the test and should be set from a code change perspective.
CHANGELOG.md
Outdated
@@ -7,6 +7,9 @@ Unreleased | |||
- Fix issue where ``AzureBlobFile`` did not respect ``location_mode`` parameter | |||
from parent ``AzureBlobFileSystem`` when using SAS credentials and connecting to | |||
new SDK clients. | |||
- The block size is now used for uploads. Previously, it was always 1 GiB irrespective of the block size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's slightly tweak this phrasing to:
- The block size is now used for partitioned uploads. Previously, 1 GiB was used for each uploaded block irrespective of the block size
Mainly technically was used previous in uploads but it was really only used for when the in-memory buffer was flushed.
CHANGELOG.md
Outdated
@@ -7,6 +7,9 @@ Unreleased | |||
- Fix issue where ``AzureBlobFile`` did not respect ``location_mode`` parameter | |||
from parent ``AzureBlobFileSystem`` when using SAS credentials and connecting to | |||
new SDK clients. | |||
- The block size is now used for uploads. Previously, it was always 1 GiB irrespective of the block size | |||
- Updated default block size to be 50 MiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a sentence to this point to say how to revert back to the previous default. So something like:
Set `blocksize` for `AzureBlobFileSystem` or `block_size` when opening an `AzureBlobFile` to revert back to 5 MiB default.
Mainly, this will help anyone trying to understand how to go back to the previous default find it more easily than needing to dig through this PR.
adlfs/tests/test_spec.py
Outdated
"filesystem_blocksize, file_blocksize, expected_blocksize", | ||
[ | ||
(None, None, 50 * 2**20), | ||
(50 * 2**20, None, 50 * 2**20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, it might make sense to make it a different value like that 7 * 2 ** 20
value to disambiguate from the default value.
adlfs/tests/test_spec.py
Outdated
(None, None, 50 * 2**20), | ||
(50 * 2**20, None, 50 * 2**20), | ||
(None, 5 * 2**20, 5 * 2**20), | ||
(50 * 2**20, 7 * 2**20, 7 * 2**20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, instead of using 50 * 2 ** 20
let's use a non-default value like 40 * 2 ** 20 to disambiguate from any defaults.
adlfs/tests/test_spec.py
Outdated
|
||
|
||
@pytest.mark.parametrize( | ||
"filesystem_blocksize, file_blocksize, expected_blocksize", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also probably worth adding an expected_filesystem_blocksize
to make the blocksize are expected at the filesystem level as well in this test case.
adlfs/tests/test_spec.py
Outdated
block_size=file_blocksize, | ||
) | ||
assert f.blocksize == expected_blocksize | ||
assert fs.blocksize == 50 * 2**20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove this assertion assuming we add the expected_filesystem_blocksize
parameter in the other parameterized test case.
adlfs/tests/test_spec.py
Outdated
assert fs.blocksize == 50 * 2**20 | ||
|
||
|
||
def test_override_blocksize(storage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so for this test case I was thinking, we'd:
- Set the
blocksize
on theAzureBlobFileSystem
- Assert that the
AzureBlobFile
without settingblock_size
does not inherit the the blocksize from 1. So after instantiation, we would just assert it is50 * 2 ** 20
, which should be its default. - Update the test name to better reflect that we are testing this code path does not inherit from the file system.
adlfs/tests/test_spec.py
Outdated
@@ -2045,3 +2046,82 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir): | |||
with fs.open("data/afile", "xb") as f: | |||
pass | |||
assert fs.cat_file("data/afile") == b"data" | |||
|
|||
|
|||
def test_number_of_blocks(storage, mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this test name to be test_uses_block_size_for_partitioned_uploads
in order to be a bit more self-descriptive of what we are trying to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just had one more comment on the test. Otherwise, let's get the perf testing analysis in a spot that we are happy with and should be set.
adlfs/tests/test_spec.py
Outdated
(50 * 2**20, None, 50 * 2**20), | ||
(None, 5 * 2**20, 5 * 2**20), | ||
(50 * 2**20, 7 * 2**20, 7 * 2**20), | ||
(None, None, 50 * 2**20, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of directly passing None
, let's just treat it as omitting block_size
/blocksize
in the consturctor/open
call. Mainly looking at the expected_filesystem_blocksize
, I'm not sure that asserting that expected filesystem blocksize is None
is what we want when we are mainly checking that it falls back to the default block size of 50 * 2 * 20
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anjaliratnam-msft Looks good. I think we are in a good spot code wise so let's just wrap up the perf testing work and analysis to make sure we are confident in the changes.
@martindurant as a heads up we may be tweaking this plan here: #509 (comment). Specifically, for point 2, we are reconsidering whether we keep the default blocksize at 5 MiB instead of changing it to 50 MiB. We've been finding data that shows that when concurrency is introduced to writes, using 5 MiB blocks is significantly faster than 50 MiB blocks. We'll let you know when the perf results are in a good spot to review.
I'm sure it depends on how many blocks there are in total! So one 50MB block would be slower than 10x5MB, but if you have enough 50MB blocks to saturate too, I would expect it to be faster by reducing overhead. |
@martindurant agreed it is definitely dependent on the number of blocks. One of the updates we are making to the perf testing is increasing the size of the blob being transferred to something fairly large (27 GiB) so that both the 5MiB and 50MiB block sizes reach concurrency saturation. Even at 5 GiB blob size, the 50 MiB block size was not necessarily saturating full concurrency so that may have been skewing result interpretation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you for putting together these performance results. This is really helpful in making decisions on this.
I think it is awesome to see that with this PR alone, we will be reducing timeout/network errors by sending smaller upload payloads (50 MiB instead of possibly 1 GiB) and we will greatly be increasing the speed for small sequential writes (e.g. writing 1KB at a time scenario) as we will be reducing the number of PUTs being made.
Furthermore, it is great to see when we then we add on this PR that adds concurrency to partitioned uploads, it improves the speed of large writes (e.g., writing a large tensor in a PyTorch model) by 2x - 3x by default and can reach up to 5x improvement if you further tune blocksize and concurrency based on your workload. With both of these PRs, it really puts write performance on par with the read performance of adlfs now.
@martindurant this all should be ready to review. @anjaliratnam-msft includes the rationale in the perf results gist but we decided to stick with the approach of 50 MiB block size. We did continue to see when concurrency is enabled and saturated, 5 MiB block sizes squeezed out more speed than the 50 MiB sizes. However, the tradeoff in further limiting default blob sizes and performance degradation when concurrency is disabled outweighed these possible gains.
Let us know what you think!
I am happy here. |
Updated the default block size to be 50 MiB so AzureBlobFileSystem and AzureBlobFile are more consistent. This also addresses the github issue where the chunk_size of 1GB results in a Timeout Error and also respects the block_size parameter sent in when opening a file.
I also created this gist to show that the performance is not impacted negatively when increasing the default from 5 to 50 MiB and performance improves when reducing the chunk size from 1 GB to 50 MiB.