Skip to content

Preprocessing & Annotation functionality #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dhensle
Copy link
Contributor

@dhensle dhensle commented Jun 17, 2025

This pull request standardizes preprocessing and annotation capabilities across all activitysim abm models.

Refactoring of Preprocessing and Table Annotation

  • Standardized Preprocessing:

    • Replaced custom preprocessing logic using expressions.assign_columns with expressions.annotate_preprocessors in models such as auto_ownership, free_parking, and joint_tour_composition. This change ensures consistency across models and reduces duplication. [1] [2] [3]
  • Unified Table Annotation:

    • Replaced individual table annotation logic with expressions.annotate_tables in models like atwork_subtour_destination, cdap, and joint_tour_frequency. This provides a consistent and reusable approach for post-model table updates. [1] [2] [3]

Simplification of Model Settings

  • Removal of Unused Fields:

    • Removed unnecessary fields like preprocessor and annotate_* from model settings classes such as AtworkSubtourFrequencySettings, AutoOwnershipSettings, and FreeParkingSettings. This cleanup eliminates unused or redundant configuration options. [1] [2] [3]
  • Reorganization in cdap:

    • Consolidated preprocessor and annotation settings into a single unified structure for better clarity and alignment with other models.

Summary table of model changes:

Model name Choosers Preprocessor Alts Preprocessor Annotator Notes
accessibility existed N/A N/A model is essentially a preprocessor
atwork_subtour_destination added added added added preprocessing in both sample and simulate step in util.tour_destination.py
atwork_subtour_frequency updated N/A added  
atwork_subtour_mode_choice updated N/A added  
atwork_subtour_scheduling added existed added ALTS_PREPROCESSOR segmented by tour purpose, added in util.vectorize_tour_scheduling.py
auto_ownership updated N/A added  
cdap added N/A updated  
disaggregate_accessibility existed N/A existed now can use alts preprocessors in logsum calculations through tour and location choice models
free_parking updated N/A updated  
initialize N/A N/A updated Model is annotator
initialize_los N/A N/A N/A  
initialize_tours N/A N/A updated Model is annotator
input_checker N/A N/A N/A  
joint_tour_composition updated N/A updated  
joint_tour_destination added added added added preprocessing in both sample and simulate step in util.tour_destination.py
joint_tour_frequency updated N/A added  
joint_tour_frequency_composition updated updated updated  
joint_tour_participation updated N/A added  
joint_tour_scheduling updated added updated ALTS_PREPROCESSOR segmented by tour purpose, added in util.vectorize_tour_scheduling.py
location_choice added added updated added preprocessing in both sample and simulate step
mandatory_scheduling added existed added ALTS_PREPROCESSOR segmented by tour purpose, added in util.vectorize_tour_scheduling.py
mandatory_tour_frequency updated N/A added  
non_mandatory_destination added added added added preprocessing in both sample and simulate step in util.tour_destination.py
non_mandatory_scheduling added added added ALTS_PREPROCESSOR segmented by tour purpose, added in util.vectorize_tour_scheduling.py
non_mandatory_tour_frequency updated added added  
parking_location_choice updated added added  
school_escorting existed added added uniquely has 3 preprocessors
stop_frequency updated N/A added  
summarize N/A N/A existed Model is annotator
telecommute_frequency updated N/A added  
tour_mode_choice updated N/A added  
tour_od_choice updated added added added preprocessing in both sample and simulate step
tour_scheduling_probabilistic N/A N/A added  
transit_pass_ownership updated N/A added  
transit_pass_subsidy updated N/A added  
trip_departure_choice existed added added  
trip_destination updated updated added added preprocessing in both sample and simulate step
trip_matrices existed N/A N/A  
trip_mode_choice existed N/A updated  
trip_purpose updated N/A added  
trip_purpose_and_destination N/A N/A added  
trip_scheduling updated N/A added is probabilistic model
trip_scheduling_choice updated added added  
vehicle_allocation updated N/A added  
vehicle_type_choice updated updated updated  
work_from_home updated N/A updated  

@jpn-- jpn-- requested a review from Copilot June 17, 2025 18:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request standardizes the preprocessing and table annotation functionality across multiple ABM models by replacing custom assign‐columns logic with unified calls to expressions.annotate_preprocessors and expressions.annotate_tables. Key changes include:

  • Replacing custom preprocessing calls with standardized annotation functions.
  • Removing unused/deprecated settings and consolidating configuration fields.
  • Updating multiple model files (e.g., auto_ownership, free_parking, joint_tour_frequency, etc.) to support the new standardized approach.

Reviewed Changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
activitysim/abm/models/parking_location_choice.py Added deprecation warning handling and introduced an alternative preprocessor field.
activitysim/abm/models/non_mandatory_tour_frequency.py Replaced assign_columns with annotate_preprocessors and updated locals dictionary usage.
activitysim/abm/models/non_mandatory_scheduling.py Updated preprocessing and table annotation calls with new expressions functions.
activitysim/abm/models/non_mandatory_destination.py Removed legacy annotation function calls and standardized table annotation.
activitysim/abm/models/mandatory_tour_frequency.py Consolidated preprocessing logic into standardized annotate_preprocessors and annotate_tables calls.
activitysim/abm/models/location_choice.py Updated both chooser and alternative preprocessing with standardized annotation calls.
activitysim/abm/models/joint_*.py Unified preprocessing and annotation functions across joint tour models.
activitysim/abm/models/free_parking.py Updated preprocessing calls to use expressions.annotate_preprocessors and annotate_tables.
activitysim/abm/models/cdap.py Added preprocessing and post-model annotation functions; restructured settings fields.
activitysim/abm/models/auto_ownership.py Removed unused preprocessor settings by moving to a pass-through structure.
activitysim/abm/models/atwork_* (scheduling, mode_choice, frequency, destination) Consolidated preprocessing and table annotation logic with the new standardized approach.
Comments suppressed due to low confidence (1)

activitysim/abm/models/location_choice.py:184

  • [nitpick] The alternative preprocessor setting name 'alts_preprocessor_sample' differs from similar settings in other models. Consider adopting a consistent naming convention (e.g., 'alts_preprocessor') across the codebase.
        preprocessor_setting_name="alts_preprocessor_sample",

@jpn-- jpn-- self-requested a review June 17, 2025 18:52
@jpn--
Copy link
Member

jpn-- commented Jun 17, 2025

A few thoughts before I dig into this more:

  1. Estimation mode tests are failing.
  2. Other than estimation mode, existing tests do show we have not broken anything that was tested. But this PR appears to add significant functionality, and lacks any testing of that new functionality.

@dhensle
Copy link
Contributor Author

dhensle commented Jun 17, 2025

Hey @jpn--, thanks for your first look. As mentioned during the call today, I have a few more things to clean up on the CI tests (which is why the PR is listed as a "draft"). I will ping you in the next few days when my commits are complete and this PR is ready for review.

@dhensle dhensle marked this pull request as ready for review June 18, 2025 23:35
@dhensle
Copy link
Contributor Author

dhensle commented Jun 18, 2025

@jpn-- Ok, this is ready for your review. I fixed the estimation mode issue and added unit tests for both the preprocessor and annotation functions which include some common expressions including reindex, skims lookups (in multiple ways), and some groupby calculations across tables. If you have other CI test ideas, I am happy to hear them.

Copy link
Member

@jpn-- jpn-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I found very little substantive to address, just a few things to clean up. Thanks for all the hard work on this.

"""Setting for the preprocessor."""

OCCUPANCY_LEVELS: list = [1] # TODO Check this
OCCUPANCY_LEVELS: list = [1, 2, 3.5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly, the previous default value here of [1] was not a great solution. But it seems the occupancy levels set here needs to jive with calls to the generated variable names set in spec files for other components. "df.vehicle_occup_1" is going to show up as a variable name elsewhere in other components. I think it would be better to make that clear in the docstring, and have no default at all (i.e. failure to define this in the config for this model is an error).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this appears to be basically unrelated to the fundamental nature of this PR, maybe just leave it out here, and we can address it in a separate specific PR for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MandatoryTourFrequencySettings still has a redundant annotate_persons defined, which it also inherits from LogitComponentSettings

COEFFICIENTS: Path
CONSTANTS: dict[str, Any] = {}
compute_settings: ComputeSettings | None = None

preprocessor: PreprocessorSettings | None = None
"""Preprocess choosers tables before running the model."""
annotate_persons: PreprocessorSettings | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic documentation tools will work better if annotate_persons has its own docstring, even though it's clear to a human reader the one docstring below applies to both.

utility expressions.
"""

annotate_households: PreprocessorSettings | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For autodoc, have each annotate_* field have its own docstring.

@@ -704,9 +704,9 @@ def drop_unused_columns(
custom_chooser_lines = inspect.getsource(custom_chooser)
unique_variables_in_spec.update(re.findall(pattern, custom_chooser_lines))

logger.info("Dropping unused variables in chooser table")
logger.debug("Dropping unused variables in chooser table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly unrelated to this PR.

I do think we should address our logging, by default we are logging far too much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants