-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[azure-storage-file-datalake] Checking for a non-existent file induces memory leak #45999
Copy link
Copy link
Open
Labels
Service AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as thatThe issue doesn't require a change to the product in order to be resolved. Most issues start as that
Description
- Package Name: azure-storage-file-datalake
- Package Version: 12.23.0 (current)
- Operating System: Linux and Darwin
- Python Version: 3.14.3 (current)
Describe the bug
When the existence of a non-existent file is checked with the Azure file client, objects in parent scopes are not garbage collected anymore.
To Reproduce
I reproduced this on Linux machines running in Azure Kubernetes and a Mac both in and outside of a Docker container. MWE:
import os
from azure.storage.filedatalake import FileSystemClient
from time import sleep
account_name = os.environ["ACCOUNT_NAME"]
file_system_name = os.environ["FILE_SYSTEM_NAME"]
sas_token = os.environ["SAS_TOKEN"]
directory_name = os.environ["DIRECTORY_NAME"]
file_name = os.environ["FILE_NAME"]
file_system_client = FileSystemClient(account_url=f"https://{account_name}.dfs.core.windows.net",
file_system_name=file_system_name, credential=sas_token)
directory_client = file_system_client.get_directory_client(directory_name)
def check_file():
file_client = directory_client.get_file_client(file_name)
file_client.exists()
big_list = list(range(10000000))
i = 0
while True:
i += 1
print(f"Iteration {i}")
check_file()
sleep(0.5)This fills about 1 GB of memory every 5 iterations, eventually leading to an out-of-memory error.
Expected behavior
This should be able to run forever, as it is when the file existence check is omitted or the existence of an existent file is checked.
Reactions are currently unavailable
Metadata
Metadata
Labels
Service AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as thatThe issue doesn't require a change to the product in order to be resolved. Most issues start as that