-
Notifications
You must be signed in to change notification settings - Fork 5
Automatic cleanup of items in /tmp #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Closes https://github.com/issues/assigned?issue=CCI-MOC%7Copenstack-billing-from-db%7C38. items being downloaded into /tmp converted to use tempfile.NamedTemporaryFile so that they are automatically cleaned up
| logger.info(f"Downloading {key} to {download_location}.") | ||
| s3.download_file(s3_bucket, key, download_location) | ||
|
|
||
| tmp_gz = tempfile.NamedTemporaryFile(delete=True, suffix=".gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, we are checking if the fetched S3 file is indeed an archive (gz) file, only then do we run gzip to decompress the archive. Therefore, you should remove suffix=".gz" since the file is not guaranteed to be a gz archive. You should also add in the suffix check.
| s3.download_file(s3_bucket, key, download_location) | ||
|
|
||
| tmp_gz = tempfile.NamedTemporaryFile(delete=True, suffix=".gz") | ||
| s3.download_file(s3_bucket, key, tmp_gz.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You reference tmp_gz.name several times in this function. You can instead do this:
| s3.download_file(s3_bucket, key, tmp_gz.name) | |
| download_location = tmp_gz.name |
Generally, in our repos, whenever we want to reference an object attribute (or any values that requires some traversal to get) more than once, we'll create a variable for it. This has the advantage of:
- You can provide a more informative name for the information you're using
- Can create a simpler diff. In this case, some of your lines are marked as changed only because you used
download_locationinstead oftmp_gz.name. Following the suggestion above would avoid that and let your PR be a bit more succinct.
| s3.download_file(s3_bucket, key, download_location) | ||
|
|
||
| tmp_gz = tempfile.NamedTemporaryFile(delete=True, suffix=".gz") | ||
| s3.download_file(s3_bucket, key, tmp_gz.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why you moved the s3.download_file statement before the logger.info(f"Downloading {key} to {tmp_gz.name}")? This would change logging behavior
| if command.returncode != 0: | ||
| raise Exception(f"Error uncompressing {download_location}.") | ||
| tmp_sql = tempfile.NamedTemporaryFile(delete=True, suffix=".sql", mode="wb") | ||
| logger.info(f"Uncompressing {tmp_gz.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same variable naming suggesting here for tmp_sql.name
| raise Exception(f"Error uncompressing {download_location}.") | ||
| tmp_sql = tempfile.NamedTemporaryFile(delete=True, suffix=".sql", mode="wb") | ||
| logger.info(f"Uncompressing {tmp_gz.name}") | ||
| result = subprocess.run(["gzip", "-cd", tmp_gz.name], stdout=tmp_sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why you renamed the variable from command to result?
| raise Exception( | ||
| f"Error converting {path_to_dump} to SQLite compatible" | ||
| f" at {destination_path}." | ||
| f" at {tmp_converted.name}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a small test myself, it seems using tempfile.NamedTemporaryFile(delete=True) will delete the file after leaving the function's scope:
>>> def write_temp():
... tmp_f = tempfile.NamedTemporaryFile(delete=True, mode='w+')
... tmp_f.write("test\n")
... tmp_f.flush()
... return tmp_f.name
...
>>> temp_f = write_temp()
>>> with open(temp_f) as f:
... print(f.read())
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/3j/jrd7vlnj1tv7c3vfpf1c9mlm0000gq/T/tmpzqv8wdld'
>>> Did this work for you on your local environment? It seems Mr. Lars' original comment may be right. This may require some code refactoring. If you can't find a solution that only involves editing the functions in fetch.py, let me know and we can discuss to move forward with the issue.
Closes #38. items being downloaded into /tmp converted to use tempfile.NamedTemporaryFile so that they are automatically cleaned up