Skip to content

Check YAML, Spec, Coefficients Settings Before Model Runs #950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 83 commits into
base: main
Choose a base branch
from

Conversation

andkay
Copy link

@andkay andkay commented Jun 10, 2025

This PR will address #784 -- and supersedes a Draft PR in the Camsys fork..

As scoped, validating expressions is not included, and the code isn't worried about tables at all.

Approach

Smoke test various configuration files using a new settings_checker module in abm.models:

  • Attempt to load YAML settings files into their relevant Pydantic data models
  • Attempt to load SPEC files
  • Attempt to load COEFFICIENTS files
  • Attempt to evaluate the SPEC and COEFFICIENT files together to determine if mismatched labels exist

Multiple methods are included for situations involving segmented or templated SPEC/COEFFICIENT files.

The settings checker will also loop through all cascaded sub-settings (such as Preprocessor or Annotator settings objects) in a given settings object to at least check whether a SPEC file is defined there, and attempt to load it.

Logging to stdout and a file called settings_checker.log is included.

Running the Checker

The settings checker takes very little time, and so is set up to run by default.

To disable the checker, users can add the following to their settings.yaml file.

check_model_settings: False

Errors

To faithfully simulate the model runtime, a key design decision in this process is to re-use as much code from elsewhere in the ActivitySim codebase as possible, so that errors will be raised consistently. For instance, functions to read and evaluate SPEC and COEFFICIENTS files are imported and used directly instead of relying on custom code.

Errors are a wrapped in a custom Exception, which is collected into a list. If the settings checker encounters any errors, these will be reported to the logs as logger.errors and the checker will raise a RunTimeError , halting the program.

Note: A limitation of allowing errors to be raised is that at each stage of the validation routine, the settings checker will collect the first fatal exception it encounters. It is possible that the checker could need to be run several times to catch additional problems. This is mostly an issue when trying to resolve coefficient labels, since a key error will get raised on the first non-matching label in the SPEC.

Missing File Paths

Due to the inherited structure of the underlying Pydantic data models, it is not possible for the settings checker to determine whether a model actually requires a COEFFICIENT or SPEC filepath to be provided. Most of the data models include these fields, but will allow them to default to None if a value is not provided in the setting's YAML file.

The way this is handled through this PR is to issue a WARNING level log alerting users that a file may be missing from the YAML settings, and this should be double checked and corrected if necessary.

Ultimately, the Pydantic models should be refactored to be explicit about when values for these fields are required (which is what those are for), allowing the settings checker catching the raised errors when trying to read the YAML files.

Settings Definitions

In order to expose Pydantic models to the checker, they are directly imported and set up in a dictionary keyed to the step name as follows:

# import model settings
from activitysim.abm.models.accessibility import AccessibilitySettings

# Setup for checker
CHECKER_SETTINGS = {
    "compute_accessibility": {
        "settings_cls": AccessibilitySettings,
        "settings_file": "accessibility.yaml",
    }, 
   ...
}

By default, the checker assumes that the relevant fields to look for CSV files are SPEC and COEFFICIENTS. This can be overridden using "spec_coefficient_keys" in the settings dictionary. These are assumed to be "paired" (i.e. the specifications are expected to contain coefficient labels).

    "school_escorting": {
        "settings_cls": SchoolEscortSettings,
        "settings_file": "school_escorting.yaml",
        "spec_coefficient_keys": [
            {"spec": "OUTBOUND_SPEC", "coefs": "OUTBOUND_COEFFICIENTS"},
            {"spec": "INBOUND_SPEC", "coefs": "INBOUND_COEFFICIENTS"},
            {"spec": "OUTBOUND_COND_SPEC", "coefs": "OUTBOUND_COND_COEFFICIENTS"},
        ]
    },

Included Settings Definitions

As of this PR, settings are defined for the following. Keep in mind that some YAML files are directly read in when their parent is constructed.

  • 'compute_accessibility'
  • 'atwork_subtour_destination'
  • 'atwork_subtour_frequency'
  • 'atwork_subtour_mode_choice'
  • 'atwork_subtour_scheduling'
  • 'auto_ownership_simulate'
  • 'cdap_simulate'
  • 'compute_disaggregate_accessibility'
  • 'free_parking'
  • 'initialize_households'
  • 'initialize_landuse'
  • 'initialize_los'
  • 'input_checker'
  • 'joint_tour_composition'
  • 'joint_tour_destination'
  • 'joint_tour_frequency_composition'
  • 'joint_tour_frequency'
  • 'joint_tour_participation'
  • 'joint_tour_scheduling'
  • 'mandatory_tour_frequency'
  • 'mandatory_tour_scheduling'
  • 'non_mandatory_tour_destination'
  • 'non_mandatory_tour_frequency'
  • 'non_mandatory_tour_scheduling'
  • 'parking_location'
  • 'school_escorting'
  • 'school_location'
  • 'shadow_pricing'
  • 'stop_frequency'
  • 'summarize'
  • 'telecommute_frequency'
  • 'tour_mode_choice_simulate'
  • 'tour_od_choice'
  • 'tour_scheduling_probabilistic'
  • 'transit_pass_ownership'
  • 'transit_pass_subsidy'
  • 'trip_departure_choice'
  • 'trip_destination'
  • 'trip_mode_choice'
  • 'trip_purpose'
  • 'trip_purpose_and_destination'
  • 'vehicle_allocation'
  • 'vehicle_type_choice'
  • 'work_from_home'
  • 'workplace_location'
  • 'write_data_dictionary'
  • 'write_trip_matrices'

Explicit Exclusions

Two models are not included in the registry as of this PR. They appear to missing required configurations in the example models, and were causing persistent failures in the settings checker. If additional guidance is provided, they could likely be added.

  • trip_scheduling_choice: The default YAML file (trip_scheduling_choice.yaml) is missing
  • trip_scheduling: The required trip_scheduling_coefficients file is missing

Extensions

The settings checker now supports a simple means of defining settings to check for extensions. To do so, developers should:

  1. Define a module called settings_checker.py in their extensions module, which contains:
  2. Import the required settings classes as well as core components, such as the State
  3. Define a dictionary called EXTENSION_SETTING_CHECKER, mapping components to settings classes and files. This is identical in structure to the core settings as described in Settings Definitions above

The design directly extends the registry of model settings to validate in the core settings_checker module, rather than defining a separate checking routine in the extensions modules.

An example implementation will be provided in the SANDAG ABM3 Example repository.

andkay added 30 commits March 19, 2025 10:03
…nction called from main checker. run formatter
…n empty dataframe, even if no top level spec exists
…or disaggregate accessibility (needs more testing)
andkay added 26 commits May 12, 2025 22:21
missing values.

because the fields in the settings classes are inherited, it is not
usually possible for the settings checker to determine if a path to an
external file is actually required for a particular model component.

the workaround is to issue a specific warning that a path *may* be
expected and users should check the YAML file.

the ultimate solution would be to define more robust Pydantic data
models that ensure that fields are marked as required when appropriate.
@@ -283,7 +285,7 @@ def run(args):
# Memory sidecar is only useful for single process runs
# multiprocess runs log memory usage without blocking in the controlling process.
mem_prof_log = state.get_log_file_path("memory_profile.csv")
from ..core.memory_sidecar import MemorySidecar
from activitysim.core.memory_sidecar import MemorySidecar
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative import here caused problems for my runs - suggest changing it to absolute.

@dhensle dhensle self-requested a review June 10, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant