-
Notifications
You must be signed in to change notification settings - Fork 1.7k
MaD generator: use --threads=0
and 2GB per thread for --ram
by default
#19744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* fix a bug where the order of model generation was determined by the order in the `download.json` file of the experiment rather than the order in the config file * allow configuring `--ram` and `--threads` in the MaD generator scripts * use no `--ram` and `--threads=0` by default in the bulk generator (single generator defaults are left unchanged) * allow to pass `--dca` multiple times, taking DBs from experiments listed last. This allows to run a subset of the sources in a "fixup" experiment and use it to "patch" a previous run without rerunning everything.
The standalone MaD generator now uses `0` for threads and throttles the RAM to use 2GB per thread by default. Also, replaced the hand-written argument parsing with `argparse`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the MaD generator scripts by adding configurable resource flags, fixing model-generation ordering, and improving DCA experiment support.
- Replace manual argument parsing in
generate_mad.py
withargparse
, introducing--threads
and--ram
(default 0 threads and 2 GB per thread). - Update
bulk_generate_mad.py
to propagate these new flags into the generator, support multiple--dca
runs, and remove the stale git-status precheck. - Fix bug where model-generation order followed
download.json
rather than the user’s config.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
misc/scripts/models-as-data/generate_mad.py | Switched to argparse , added threads /ram handling, and updated default RAM |
misc/scripts/models-as-data/bulk_generate_mad.py | Changed generate_models signature to accept CLI args, wired new flags, updated DCA download loop |
Comments suppressed due to low confidence (2)
misc/scripts/models-as-data/bulk_generate_mad.py:228
- Update this function’s docstring to document the new
args
parameter (its type, purpose, and which CLI flags it carries).
def generate_models(config, args, project: Project, database_dir: str) -> None:
misc/scripts/models-as-data/generate_mad.py:143
- Add tests (unit or integration) to verify the behavior of the new CLI flags (
--threads
,--ram
,--with-*
) and defaulting logic ingenerate_mad.py
.
generator = p.parse_args(namespace=Generator())
Models are regenerated with the fix from #19744 which corrects the order of generation.
--threads=0
and 2GB per thread for--ram
by defaultdownload.json
file of the experiment rather than the order in the config file--ram
and--threads
in the MaD generator scriptsThe review should ignore ae3bbb0 and 5df292c in order to not look at
black
formatting changes.