-
Notifications
You must be signed in to change notification settings - Fork 109
Check YAML, Spec, Coefficients Settings Before Model Runs #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
andkay
wants to merge
83
commits into
ActivitySim:main
Choose a base branch
from
camsys:settings-checker
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… add second model to validate
… settings to independent function
…nction called from main checker. run formatter
…n empty dataframe, even if no top level spec exists
…or disaggregate accessibility (needs more testing)
…e flow for nested specs
…uisite refactors to error collection
…in model settings
missing values. because the fields in the settings classes are inherited, it is not usually possible for the settings checker to determine if a path to an external file is actually required for a particular model component. the workaround is to issue a specific warning that a path *may* be expected and users should check the YAML file. the ultimate solution would be to define more robust Pydantic data models that ensure that fields are marked as required when appropriate.
…s to checker entrypoint as argument.
andkay
commented
Jun 10, 2025
@@ -283,7 +285,7 @@ def run(args): | |||
# Memory sidecar is only useful for single process runs | |||
# multiprocess runs log memory usage without blocking in the controlling process. | |||
mem_prof_log = state.get_log_file_path("memory_profile.csv") | |||
from ..core.memory_sidecar import MemorySidecar | |||
from activitysim.core.memory_sidecar import MemorySidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative import here caused problems for my runs - suggest changing it to absolute.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR will address #784 -- and supersedes a Draft PR in the Camsys fork..
As scoped, validating expressions is not included, and the code isn't worried about tables at all.
Approach
Smoke test various configuration files using a new
settings_checker
module inabm.models
:Multiple methods are included for situations involving segmented or templated SPEC/COEFFICIENT files.
The settings checker will also loop through all cascaded sub-settings (such as
Preprocessor
orAnnotator
settings objects) in a given settings object to at least check whether a SPEC file is defined there, and attempt to load it.Logging to stdout and a file called
settings_checker.log
is included.Running the Checker
The settings checker takes very little time, and so is set up to run by default.
To disable the checker, users can add the following to their
settings.yaml
file.Errors
To faithfully simulate the model runtime, a key design decision in this process is to re-use as much code from elsewhere in the ActivitySim codebase as possible, so that errors will be raised consistently. For instance, functions to read and evaluate SPEC and COEFFICIENTS files are imported and used directly instead of relying on custom code.
Errors are a wrapped in a custom Exception, which is collected into a list. If the settings checker encounters any errors, these will be reported to the logs as
logger.errors
and the checker will raise aRunTimeError
, halting the program.Note: A limitation of allowing errors to be raised is that at each stage of the validation routine, the settings checker will collect the first fatal exception it encounters. It is possible that the checker could need to be run several times to catch additional problems. This is mostly an issue when trying to resolve coefficient labels, since a key error will get raised on the first non-matching label in the SPEC.
Missing File Paths
Due to the inherited structure of the underlying Pydantic data models, it is not possible for the settings checker to determine whether a model actually requires a COEFFICIENT or SPEC filepath to be provided. Most of the data models include these fields, but will allow them to default to
None
if a value is not provided in the setting's YAML file.The way this is handled through this PR is to issue a
WARNING
level log alerting users that a file may be missing from the YAML settings, and this should be double checked and corrected if necessary.Ultimately, the Pydantic models should be refactored to be explicit about when values for these fields are required (which is what those are for), allowing the settings checker catching the raised errors when trying to read the YAML files.
Settings Definitions
In order to expose Pydantic models to the checker, they are directly imported and set up in a dictionary keyed to the step name as follows:
By default, the checker assumes that the relevant fields to look for CSV files are SPEC and COEFFICIENTS. This can be overridden using "spec_coefficient_keys" in the settings dictionary. These are assumed to be "paired" (i.e. the specifications are expected to contain coefficient labels).
Included Settings Definitions
As of this PR, settings are defined for the following. Keep in mind that some YAML files are directly read in when their parent is constructed.
Explicit Exclusions
Two models are not included in the registry as of this PR. They appear to missing required configurations in the example models, and were causing persistent failures in the settings checker. If additional guidance is provided, they could likely be added.
trip_scheduling_choice
: The default YAML file (trip_scheduling_choice.yaml) is missingtrip_scheduling
: The requiredtrip_scheduling_coefficients
file is missingExtensions
The settings checker now supports a simple means of defining settings to check for extensions. To do so, developers should:
settings_checker.py
in their extensions module, which contains:EXTENSION_SETTING_CHECKER
, mapping components to settings classes and files. This is identical in structure to the core settings as described in Settings Definitions aboveThe design directly extends the registry of model settings to validate in the core
settings_checker
module, rather than defining a separate checking routine in the extensions modules.An example implementation will be provided in the SANDAG ABM3 Example repository.