-
Notifications
You must be signed in to change notification settings - Fork 109
Preprocessing & Annotation functionality #957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request standardizes the preprocessing and table annotation functionality across multiple ABM models by replacing custom assign‐columns logic with unified calls to expressions.annotate_preprocessors and expressions.annotate_tables. Key changes include:
- Replacing custom preprocessing calls with standardized annotation functions.
- Removing unused/deprecated settings and consolidating configuration fields.
- Updating multiple model files (e.g., auto_ownership, free_parking, joint_tour_frequency, etc.) to support the new standardized approach.
Reviewed Changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
activitysim/abm/models/parking_location_choice.py | Added deprecation warning handling and introduced an alternative preprocessor field. |
activitysim/abm/models/non_mandatory_tour_frequency.py | Replaced assign_columns with annotate_preprocessors and updated locals dictionary usage. |
activitysim/abm/models/non_mandatory_scheduling.py | Updated preprocessing and table annotation calls with new expressions functions. |
activitysim/abm/models/non_mandatory_destination.py | Removed legacy annotation function calls and standardized table annotation. |
activitysim/abm/models/mandatory_tour_frequency.py | Consolidated preprocessing logic into standardized annotate_preprocessors and annotate_tables calls. |
activitysim/abm/models/location_choice.py | Updated both chooser and alternative preprocessing with standardized annotation calls. |
activitysim/abm/models/joint_*.py | Unified preprocessing and annotation functions across joint tour models. |
activitysim/abm/models/free_parking.py | Updated preprocessing calls to use expressions.annotate_preprocessors and annotate_tables. |
activitysim/abm/models/cdap.py | Added preprocessing and post-model annotation functions; restructured settings fields. |
activitysim/abm/models/auto_ownership.py | Removed unused preprocessor settings by moving to a pass-through structure. |
activitysim/abm/models/atwork_* (scheduling, mode_choice, frequency, destination) | Consolidated preprocessing and table annotation logic with the new standardized approach. |
Comments suppressed due to low confidence (1)
activitysim/abm/models/location_choice.py:184
- [nitpick] The alternative preprocessor setting name 'alts_preprocessor_sample' differs from similar settings in other models. Consider adopting a consistent naming convention (e.g., 'alts_preprocessor') across the codebase.
preprocessor_setting_name="alts_preprocessor_sample",
A few thoughts before I dig into this more:
|
Hey @jpn--, thanks for your first look. As mentioned during the call today, I have a few more things to clean up on the CI tests (which is why the PR is listed as a "draft"). I will ping you in the next few days when my commits are complete and this PR is ready for review. |
@jpn-- Ok, this is ready for your review. I fixed the estimation mode issue and added unit tests for both the preprocessor and annotation functions which include some common expressions including reindex, skims lookups (in multiple ways), and some groupby calculations across tables. If you have other CI test ideas, I am happy to hear them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I found very little substantive to address, just a few things to clean up. Thanks for all the hard work on this.
"""Setting for the preprocessor.""" | ||
|
||
OCCUPANCY_LEVELS: list = [1] # TODO Check this | ||
OCCUPANCY_LEVELS: list = [1, 2, 3.5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly, the previous default value here of [1] was not a great solution. But it seems the occupancy levels set here needs to jive with calls to the generated variable names set in spec files for other components. "df.vehicle_occup_1" is going to show up as a variable name elsewhere in other components. I think it would be better to make that clear in the docstring, and have no default at all (i.e. failure to define this in the config for this model is an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this appears to be basically unrelated to the fundamental nature of this PR, maybe just leave it out here, and we can address it in a separate specific PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MandatoryTourFrequencySettings
still has a redundant annotate_persons
defined, which it also inherits from LogitComponentSettings
COEFFICIENTS: Path | ||
CONSTANTS: dict[str, Any] = {} | ||
compute_settings: ComputeSettings | None = None | ||
|
||
preprocessor: PreprocessorSettings | None = None | ||
"""Preprocess choosers tables before running the model.""" | ||
annotate_persons: PreprocessorSettings | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automatic documentation tools will work better if annotate_persons
has its own docstring, even though it's clear to a human reader the one docstring below applies to both.
utility expressions. | ||
""" | ||
|
||
annotate_households: PreprocessorSettings | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For autodoc, have each annotate_* field have its own docstring.
@@ -704,9 +704,9 @@ def drop_unused_columns( | |||
custom_chooser_lines = inspect.getsource(custom_chooser) | |||
unique_variables_in_spec.update(re.findall(pattern, custom_chooser_lines)) | |||
|
|||
logger.info("Dropping unused variables in chooser table") | |||
logger.debug("Dropping unused variables in chooser table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly unrelated to this PR.
I do think we should address our logging, by default we are logging far too much
This pull request standardizes preprocessing and annotation capabilities across all activitysim abm models.
Refactoring of Preprocessing and Table Annotation
Standardized Preprocessing:
expressions.assign_columns
withexpressions.annotate_preprocessors
in models such asauto_ownership
,free_parking
, andjoint_tour_composition
. This change ensures consistency across models and reduces duplication. [1] [2] [3]Unified Table Annotation:
expressions.annotate_tables
in models likeatwork_subtour_destination
,cdap
, andjoint_tour_frequency
. This provides a consistent and reusable approach for post-model table updates. [1] [2] [3]Simplification of Model Settings
Removal of Unused Fields:
preprocessor
andannotate_*
from model settings classes such asAtworkSubtourFrequencySettings
,AutoOwnershipSettings
, andFreeParkingSettings
. This cleanup eliminates unused or redundant configuration options. [1] [2] [3]Reorganization in
cdap
:Summary table of model changes: