Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
# DataHelm

DataHelm is a data engineering framework focused on:
DataHelm is a data engineering framework focused on the following:

- source ingestion orchestration
- source ingestion and orchestration
- dbt transformation workflows
- notebook-based dashboard execution
- reusable provider connectors (SharePoint, GCS, S3, BigQuery)
- optional local-LLM analytics query scaffolding
- reusable provider connectors (SharePoint, GCS, S3, and BigQuery)
- optional local LLM analytics query scaffolding

![alt text](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true)
![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true)

## Core Capabilities

- **Config-driven ingestion** using YAML in `config/api/`
- **Dagster orchestration** for jobs, schedules, and sensors
- **dbt project execution** through `analytics/dbt_runner.py` and dbt configs
- **Dagster orchestration** for managing jobs, schedules, and sensors
- **dbt project execution** through `analytics/dbt_runner.py` and dbt configuration files
- **Dashboard generation** with Dagstermill notebooks
- **Reusable handlers/connectors** for multiple external providers
- **Optional NL-to-SQL module** (`analytics/nl_query/`) for local Ollama-based analytics workflows

## High-Level Architecture

The repository follows layered responsibilities:
The repository follows a layered responsibility structure:

- `handlers/`: provider-specific source connectors and API handlers
- `ingestion/`: ingestion factory + native ingestion implementations
Expand Down Expand Up @@ -60,7 +60,7 @@ docs/
### Prerequisites

- Python 3.12+
- PostgreSQL (reachable from local environment)
- PostgreSQL (accessible from the local environment)
- Optional: Docker, local Ollama, dbt CLI

### Installation
Expand All @@ -74,7 +74,7 @@ pip install -e .

### Environment Variables

Create a `.env` file in repository root with required values, for example:
Create a `.env` file in the repository root with the required values, for example:

```env
DB_HOST=${DB_HOST}
Expand All @@ -91,7 +91,7 @@ CLASHOFCLANS_API_TOKEN=${CLASHOFCLANS_API_TOKEN}
python scripts/run_dagster_dev.py
```

Useful option:
Useful option for quick verification:

```bash
python scripts/run_dagster_dev.py --print-only
Expand Down Expand Up @@ -149,7 +149,7 @@ Run all tests:
.venv/bin/python -m pytest -q
```

Current suite covers:
The current test suite includes coverage for:

- ingestion and handler behavior
- analytics factory and runner logic
Expand All @@ -172,7 +172,7 @@ Workflows:

Container image is defined via `Dockerfile`.

Default runtime command starts Dagster gRPC:
Default runtime command starts the Dagster gRPC server:

```bash
python -m dagster api grpc -m dagster_op.repository
Expand Down
Loading