Skip to content

Conversation

@KelvinLinBU
Copy link
Member

Closes #38. items being downloaded into /tmp converted to use tempfile.NamedTemporaryFile so that they are automatically cleaned up

Closes https://github.com/issues/assigned?issue=CCI-MOC%7Copenstack-billing-from-db%7C38. items being downloaded into /tmp converted to use tempfile.NamedTemporaryFile so that they are automatically cleaned up
@QuanMPhm QuanMPhm self-requested a review June 3, 2025 14:53
logger.info(f"Downloading {key} to {download_location}.")
s3.download_file(s3_bucket, key, download_location)

tmp_gz = tempfile.NamedTemporaryFile(delete=True, suffix=".gz")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, we are checking if the fetched S3 file is indeed an archive (gz) file, only then do we run gzip to decompress the archive. Therefore, you should remove suffix=".gz" since the file is not guaranteed to be a gz archive. You should also add in the suffix check.

s3.download_file(s3_bucket, key, download_location)

tmp_gz = tempfile.NamedTemporaryFile(delete=True, suffix=".gz")
s3.download_file(s3_bucket, key, tmp_gz.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You reference tmp_gz.name several times in this function. You can instead do this:

Suggested change
s3.download_file(s3_bucket, key, tmp_gz.name)
download_location = tmp_gz.name

Generally, in our repos, whenever we want to reference an object attribute (or any values that requires some traversal to get) more than once, we'll create a variable for it. This has the advantage of:

  1. You can provide a more informative name for the information you're using
  2. Can create a simpler diff. In this case, some of your lines are marked as changed only because you used download_location instead of tmp_gz.name. Following the suggestion above would avoid that and let your PR be a bit more succinct.

s3.download_file(s3_bucket, key, download_location)

tmp_gz = tempfile.NamedTemporaryFile(delete=True, suffix=".gz")
s3.download_file(s3_bucket, key, tmp_gz.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you moved the s3.download_file statement before the logger.info(f"Downloading {key} to {tmp_gz.name}")? This would change logging behavior

if command.returncode != 0:
raise Exception(f"Error uncompressing {download_location}.")
tmp_sql = tempfile.NamedTemporaryFile(delete=True, suffix=".sql", mode="wb")
logger.info(f"Uncompressing {tmp_gz.name}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same variable naming suggesting here for tmp_sql.name

raise Exception(f"Error uncompressing {download_location}.")
tmp_sql = tempfile.NamedTemporaryFile(delete=True, suffix=".sql", mode="wb")
logger.info(f"Uncompressing {tmp_gz.name}")
result = subprocess.run(["gzip", "-cd", tmp_gz.name], stdout=tmp_sql)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you renamed the variable from command to result?

raise Exception(
f"Error converting {path_to_dump} to SQLite compatible"
f" at {destination_path}."
f" at {tmp_converted.name}."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Copy link
Collaborator

@QuanMPhm QuanMPhm Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a small test myself, it seems using tempfile.NamedTemporaryFile(delete=True) will delete the file after leaving the function's scope:

>>> def write_temp():
...     tmp_f = tempfile.NamedTemporaryFile(delete=True, mode='w+')
...     tmp_f.write("test\n")
...     tmp_f.flush()
...     return tmp_f.name
... 
>>> temp_f = write_temp()
>>> with open(temp_f) as f:
...     print(f.read())
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/3j/jrd7vlnj1tv7c3vfpf1c9mlm0000gq/T/tmpzqv8wdld'
>>> 

Did this work for you on your local environment? It seems Mr. Lars' original comment may be right. This may require some code refactoring. If you can't find a solution that only involves editing the functions in fetch.py, let me know and we can discuss to move forward with the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants