Skip to content

211-Connect/hsds-entity-resolution

Repository files navigation

hsds_entity_resolution

hsds_entity_resolution helps community organizations deduplicate HSDS data and orchestrate continual checks that support long-running community data sharing partnerships.

Project goals

  • Improve entity matching quality across partner-provided HSDS datasets
  • Reduce duplicate records that block trusted cross-organization coordination
  • Run repeatable validation and quality checks as data pipelines evolve
  • Support sustainable, long-term community data sharing operations

Tooling

  • Dagster (dagster, dg): pipeline orchestration, definitions, and local development UI
  • Pydantic v2: typed data models and validation for HSDS entities and pipeline I/O
  • Ruff: Python formatting and linting for fast local feedback
  • Pyright: static type checking for src/ and tests/
  • Codacy CLI (.codacy/cli.sh): static analysis and security scanning (Pylint, Semgrep, Lizard, Trivy)
  • uv: dependency and virtual environment management

Component Package Layout

Reusable Dagster components live in:

  • src/hsds_entity_resolution/dagster/components/

Core library code should live outside the Dagster adapter layer:

  • src/hsds_entity_resolution/core/
  • src/hsds_entity_resolution/types/
  • src/hsds_entity_resolution/config/

The canonical public component entry point is:

  • hsds_entity_resolution.dagster.components.EntityResolutionComponent

This module is exported through the Dagster registry entry-point group:

  • dagster_dg_cli.registry_modules

Getting started

Install dependencies

Option 1: uv

Ensure uv is installed following the official documentation, then run:

uv sync

Activate the virtual environment:

OS Command
MacOS source .venv/bin/activate
Windows .venv\Scripts\activate

Option 2: pip

python3 -m venv .venv
source .venv/bin/activate  # MacOS
pip install -e ".[dev]"

Run the project

Start Dagster locally:

dg dev

Then open http://localhost:3000.

Contributing

See CONTRIBUTING.md for pull request requirements, quality checks, and review expectations.

Using This In Another Dagster Repo

  1. Publish or install this package (for example: pip install hsds_entity_resolution).
  2. Confirm discovery in the target environment:
dg list components --package hsds_entity_resolution
  1. Use the component key in YAML:
type: hsds_entity_resolution.dagster.components.EntityResolutionComponent
attributes: {}

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors