Skip to content

Add neighborhood csv submission instructions#801

Merged
rachelekm merged 2 commits intodevelopfrom
feature/rkm/add-csv-instructions
Mar 11, 2026
Merged

Add neighborhood csv submission instructions#801
rachelekm merged 2 commits intodevelopfrom
feature/rkm/add-csv-instructions

Conversation

@rachelekm
Copy link
Collaborator

@rachelekm rachelekm commented Mar 4, 2026

Overview

Adds a README to the neighborhood_data dir for more specific neighborhood data template instructions. Intended to be used as reference for us as well as client when sending over csv template to prevent loss of context between data updates.

See #788 (comment) for background.

Checklist

  • fixup! commits have been squashed
  • CHANGELOG.md updated with summary of features or fixes, following
    Keep a Changelog guidelines
  • README.md updated if necessary to reflect the changes
  • Run ./scripts/format to lint, format, and fix the application source code.
  • CI passes after rebase

Testing Instructions

  • Review .txt file and ensure grammatically correct and easy to read
  • Review table rows against expected column types and inline comments in neighborhood_data/generate_neighborhood_json script to confirm parity

Partially resolves #788

Copy link
Member

@aaronxsu aaronxsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something. I have a few questions and a suggestion:

  1. The generate_neighborhood_json.py script seems to expect some image related columns, but they are not in the readme. I wonder if those columns are added by fetch_images.py in the processing pipeline and are thus not required in the CSV. Is this the case?
  2. The generate_neighborhood_json.py script removes x, y, lat, lon from the column list. The README documents lat and lon but not x and y. I presume they are added by add_zcta_centroids.py. I'd like to verify if this understanding is correct.
  3. The generate_neighborhood_json.py script casts type of zipcode to string, but the readme has this field as Number. I wonder if this could be that Boston zip code can be like 02108 and thus we needed it to be str so that the leading 0 won't get lost. If this is the case, then we might want to change the data type from Number to Text for zipcode in the readme.
  4. Suggestion: the README.md at the root of the repo has a great section for Data. I'd like to suggest we add a sentence to point to this new readme text file that this should be referenced as the CSV schema. What do you think?

@rachelekm
Copy link
Collaborator Author

Thanks for the review, @aaronxsu! Answering your questions below:

  1. The generate_neighborhood_json.py script seems to expect some image related columns, but they are not in the readme. I wonder if those columns are added by fetch_images.py in the processing pipeline and are thus not required in the CSV. Is this the case?

Yes, those extra image metadata columns are generated in fetch_images, here, and written to an intermediate output file neighborhood_centroids_descriptions.csv. This file is then read and processed in the generate_neighborhood_json script. So those metadata columns are expected, but not in the initial CSV and only following fetch_images.

  1. The generate_neighborhood_json.py script removes x, y, lat, lon from the column list. The README documents lat and lon but not x and y. I presume they are added by add_zcta_centroids.py. I'd like to verify if this understanding is correct.

Yup similar to the above, x and ``y are added by the add_zcta_centroids script to an enriched, intermediate output file, `neighborhood_centroids_descriptions.csv`, here. The `x` and `y` values are just used to build a `Neighborhood` geometry coordinates so they're tossed once used in `generate_neighborhood_json` along with `lon and `lat`.

  1. The generate_neighborhood_json.py script casts type of zipcode to string, but the readme has this field as Number. I wonder if this could be that Boston zip code can be like 02108 and thus we needed it to be str so that the leading 0 won't get lost. If this is the case, then we might want to change the data type from Number to Text for zipcode in the readme.

That's a good point, number felt intuitive for the BHA-side to interpret but we do want to preserve those starting 0 values (and it gets cast to a string anyways). I updated the README value type for zipcode in 25b8e3f.

  1. Suggestion: the README.md at the root of the repo has a great section for Data. I'd like to suggest we add a sentence to point to this new readme text file that this should be referenced as the CSV schema. What do you think?

That makes sense! Just added that context in 25b8e3f, let me know what you think!

Copy link
Member

@aaronxsu aaronxsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the replies to my questions and for the changes. They make sense to me.

This PR looks good and is good to go!

@rachelekm rachelekm force-pushed the feature/rkm/add-csv-instructions branch from 25b8e3f to 3acd3af Compare March 11, 2026 22:36
@rachelekm
Copy link
Collaborator Author

Thanks for the review!

@rachelekm rachelekm merged commit 273991a into develop Mar 11, 2026
2 checks passed
@rachelekm rachelekm deleted the feature/rkm/add-csv-instructions branch March 11, 2026 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate discrepancies in recommendation logic

2 participants