hsds_entity_resolution helps community organizations deduplicate HSDS data and orchestrate
continual checks that support long-running community data sharing partnerships.
- Improve entity matching quality across partner-provided HSDS datasets
- Reduce duplicate records that block trusted cross-organization coordination
- Run repeatable validation and quality checks as data pipelines evolve
- Support sustainable, long-term community data sharing operations
- Dagster (
dagster,dg): pipeline orchestration, definitions, and local development UI - Pydantic v2: typed data models and validation for HSDS entities and pipeline I/O
- Ruff: Python formatting and linting for fast local feedback
- Pyright: static type checking for
src/andtests/ - Codacy CLI (
.codacy/cli.sh): static analysis and security scanning (Pylint, Semgrep, Lizard, Trivy) - uv: dependency and virtual environment management
Reusable Dagster components live in:
src/hsds_entity_resolution/dagster/components/
Core library code should live outside the Dagster adapter layer:
src/hsds_entity_resolution/core/src/hsds_entity_resolution/types/src/hsds_entity_resolution/config/
The canonical public component entry point is:
hsds_entity_resolution.dagster.components.EntityResolutionComponent
This module is exported through the Dagster registry entry-point group:
dagster_dg_cli.registry_modules
Option 1: uv
Ensure uv is installed following the
official documentation, then run:
uv syncActivate the virtual environment:
| OS | Command |
|---|---|
| MacOS | source .venv/bin/activate |
| Windows | .venv\Scripts\activate |
Option 2: pip
python3 -m venv .venv
source .venv/bin/activate # MacOS
pip install -e ".[dev]"Start Dagster locally:
dg devThen open http://localhost:3000.
See CONTRIBUTING.md for pull request requirements, quality checks, and review expectations.
- Publish or install this package (for example:
pip install hsds_entity_resolution). - Confirm discovery in the target environment:
dg list components --package hsds_entity_resolution- Use the component key in YAML:
type: hsds_entity_resolution.dagster.components.EntityResolutionComponent
attributes: {}