-
Notifications
You must be signed in to change notification settings - Fork 63
Add ability to save synthesizers and data when running benchmark_single_table #415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #415 +/- ##
==========================================
+ Coverage 66.46% 68.56% +2.09%
==========================================
Files 20 20
Lines 1330 1422 +92
==========================================
+ Hits 884 975 +91
- Misses 446 447 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
sdgym/benchmark.py
Outdated
message = ( | ||
f"Parameters '{parameters}' are deprecated in the `benchmark_single_table` " | ||
'function and will be removed in October 2025. ' | ||
'Please consider using `output_destination` instead.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this warning message makes sense. I introduce output_destination
, but not all the deprecated parameters relate to saving data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm good question. I think we should deprecate run_on_ec2
in the next issue when you add the new benchmark function. You can be more descriptive here and say:
For saving results, please use 'output_destination'. For running SDGym remotely on AWS, please use ...
sdgym/benchmark.py
Outdated
message = ( | ||
f"Parameters '{parameters}' are deprecated in the `benchmark_single_table` " | ||
'function and will be removed in October 2025. ' | ||
'Please consider using `output_destination` instead.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm good question. I think we should deprecate run_on_ec2
in the next issue when you add the new benchmark function. You can be more descriptive here and say:
For saving results, please use 'output_destination'. For running SDGym remotely on AWS, please use ...
with open(run_file, 'r') as f: | ||
run_data = yaml.safe_load(f) or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to grab a lock here or worry about multiple runs trying to modify this file at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we're safe here because the method is called after all the jobs are run and the results generated.
sdgym/benchmark.py
Outdated
else: | ||
scores.to_csv(result_file, index=False, mode='a', header=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that two run might try to access this file at the same time?
eab6ecb
to
662786d
Compare
Resolve #410
CU-86b5dy0pa
Thanks in advance for the review. Here are a few questions:
output_destination
already exists? Or should we always be overwriting (for instance if two benchmark are launched the same day)meta.yaml
file that is expected to be saved inSDGym_results_mm_dd_yyyy/<dataset_name_mm_dd_yyyy>
? I did not create it yet because I was not sure what to put inside it.synthetic_data.csv
instead of_synthetic_data.csv
), is it okay?run<id>.yaml
is at the output_destination as well as theSDGym_results_mm_dd_yyyy
is it correct or should it be insideSDGym_results_mm_dd_yyyy
?SDGym_results_mm_dd_yyyy
run<id>.yaml
I defined the starting_date and completed_date that correspond to the time the benchmark was started and fully commpleted.