DevStrikerTech · DevStrikerTech · Mar 15, 2026 · Mar 14, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/README.md b/README.md
@@ -2,11 +2,11 @@
 
 DataHelm is a data engineering framework focused on the following:
 
-- source ingestion and orchestration
+- Source ingestion and orchestration
 - dbt transformation workflows
-- notebook-based dashboard execution
-- reusable provider connectors (SharePoint, GCS, S3, and BigQuery)
-- optional local LLM analytics query scaffolding
+- Notebook-based dashboard execution
+- Reusable provider connectors (SharePoint, GCS, S3, and BigQuery)
+- Optional local LLM analytics query scaffolding
 
 ![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true)
 
@@ -53,18 +53,20 @@ ingestion/
 tests/
 scripts/
 docs/
-```
+````
 
 ## Local Setup
 
 ### Prerequisites
 
-- Python 3.12+
-- PostgreSQL (accessible from the local environment)
-- Optional: Docker, local Ollama, dbt CLI
+Python 3.12+
+PostgreSQL (accessible from the local environment)
+Optional: Docker, local Ollama, dbt CLI
 
 ### Installation
 
+Run the following commands to set up the local environment:
+
 ```bash
 python3 -m venv .venv
 source .venv/bin/activate
@@ -74,9 +76,9 @@ pip install -e .
 
 ### Environment Variables
 
-Create a `.env` file in the repository root with the required values, for example:
+Create a file named `.env` in the root of the repository with the required values, for example:
 
-```env
+```text
 DB_HOST=${DB_HOST}
 DB_PORT=${DB_PORT}
 DB_USER=${DB_USER}
@@ -87,90 +89,85 @@ CLASHOFCLANS_API_TOKEN=${CLASHOFCLANS_API_TOKEN}
 
 ### Run Dagster Locally
 
+To start Dagster locally, run:
+
 ```bash
 python scripts/run_dagster_dev.py
 ```
 
-Useful option for quick verification:
+For a quick verification without executing jobs, run:
 
 ```bash
 python scripts/run_dagster_dev.py --print-only
 ```
 
 ## Configuration Model
 
-### Ingestion Config (`config/api/*.yaml`)
+### Ingestion Config (config/api/*.yaml)
 
 Defines source-level extraction, publish targets, schedules, and column mapping.
+Example included: CLASHOFCLANS_PLAYER_STATS
 
-Example currently included:
-
-- `CLASHOFCLANS_PLAYER_STATS`
-
-### dbt Config (`config/dbt/projects.yaml`)
+### dbt Config (config/dbt/projects.yaml)
 
 Defines dbt units, selection/exclusion rules, vars, and schedules.
 
-### Dashboard Config (`config/dashboard/projects.yaml`)
+### Dashboard Config (config/dashboard/projects.yaml)
 
 Defines notebook path, source table mapping, chart columns, and cadence.
 
-### Analytics Semantic Config (`config/analytics/semantic_catalog.yaml`)
+### Analytics Semantic Config (config/analytics/semantic_catalog.yaml)
 
 Defines dataset metadata for the isolated NL-to-SQL module.
 
 ## Reusable Connectors
 
-The repository includes reusable connector classes under `handlers/`:
+The repository includes reusable connector classes under handlers/:
 
-- `handlers/sharepoint/sharepoint.py`
-  - Microsoft Graph auth + site/file access helpers
-- `handlers/gcs/gcs.py`
-  - upload/download/list/delete/signed URL helpers
-- `handlers/s3/s3.py`
-  - upload/download/list/delete/presigned URL helpers
-- `handlers/bigquery/bigquery.py`
-  - query, row fetch, dataframe load, schema helpers
+handlers/sharepoint/sharepoint.py – Microsoft Graph auth + site/file access helpers
+handlers/gcs/gcs.py – Upload/download/list/delete/signed URL helpers
+handlers/s3/s3.py – Upload/download/list/delete/presigned URL helpers
+handlers/bigquery/bigquery.py – Query, row fetch, dataframe load, schema helpers
 
 ## Local LLM Analytics Module
 
-`analytics/nl_query/` is an isolated module for natural-language-to-SQL generation using local Ollama:
+analytics/nl_query/ is an isolated module for natural-language-to-SQL generation using local Ollama:
 
-- semantic catalog loader
-- SQL read-only safety guard
-- Ollama client wrapper
-- orchestration service
+* Semantic catalog loader
+* SQL read-only safety guard
+* Ollama client wrapper
+* Orchestration service
 
 ## Testing
 
-Run all tests:
+Run all tests with the following command:
 
 ```bash
 .venv/bin/python -m pytest -q
 ```
 
 The current test suite includes coverage for:
 
-- ingestion and handler behavior
-- analytics factory and runner logic
-- connector modules (SharePoint, GCS, S3, BigQuery)
-- script behavior
-- NL-query safety and service paths
+* Ingestion and handler behavior
+* Analytics factory and runner logic
+* Connector modules (SharePoint, GCS, S3, BigQuery)
+* Script behavior
+* NL-query safety and service paths
 
 ## CI/CD and Branching
 
-- `dev`: integration branch
-- `master`: release/production branch
+* dev: integration branch
+* master: release/production branch
 
 Workflows:
 
-- **CI**: tests on development and PR flows
-- **Docker Release**: image build/publish on `master`
-- **Deploy Release**: workflow_run/manual deployment orchestration
+* CI: tests on development and PR flows
+* Docker Release: image build/publish on master
+* Deploy Release: workflow_run/manual deployment orchestration
 
 ## Containerization
 
-Container image is defined via `Dockerfile`.
+Container image is defined via Dockerfile.
 
 Default runtime command starts the Dagster gRPC server:
 
@@ -182,17 +179,11 @@ python -m dagster api grpc -m dagster_op.repository
 
 Deployment flow is workflow-based:
 
-- production auto-path after successful Docker release
-- manual staging/production dispatch path
-
-## Contributing and Governance
-
-- Contribution guide: `CONTRIBUTING.md`
-- Code of conduct: `CODE_OF_CONDUCT.md`
-- Security reporting: `SECURITY.md`
+* Production auto-path after successful Docker release
+* Manual staging/production dispatch path
 
 ## Detailed Technical Documentation
 
 For complete, long-form project documentation (operations, architecture, and runbook-style details), see:
 
-- `docs/document.md`
+docs/document.md
diff --git a/scripts/lint_configs.py b/scripts/lint_configs.py
@@ -0,0 +1,48 @@
+import os
+import argparse
+import yaml
+
+def lint_directory(config_dir):
+    # --- FIX 1: Path Validation ---
+    if not os.path.isdir(config_dir):
+        print(f"🚨 Error: The path '{config_dir}' does not exist or is not a directory.")
+        exit(1)
+
+    print(f"🔍 Linting YAML files in '{config_dir}/'...\n")
+
+    error_count = 0
+    file_count = 0
+
+    for root, _, files in os.walk(config_dir):
+        for file in files:
+            if file.endswith((".yaml", ".yml")):
+                file_count += 1
+                filepath = os.path.join(root, file)
+
+                # --- FIX 2: File Read Robustness ---
+                try:
+                    with open(filepath, 'r', encoding='utf-8') as f:
+                        yaml.safe_load(f)
+                except OSError as e:
+                    error_count += 1
+                    print(f"❌ IO Error in: {filepath}\n   Details: {e}\n")
+                except yaml.YAMLError as exc:
+                    error_count += 1
+                    print(f"❌ Syntax Error in: {filepath}")
+                    if hasattr(exc, 'problem_mark'):
+                        mark = exc.problem_mark
+                        print(f"   Hint: Check line {mark.line + 1}, column {mark.column + 1}.\n")
+                    else:
+                        print(f"   Details: {exc}\n")
+
+    if error_count == 0:
+        print(f"✅ Success! Checked {file_count} files and found no errors.")
+    else:
+        print(f"🚨 Failed: Found {error_count} error(s).")
+        exit(1)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Lint YAML configuration files.")
+    parser.add_argument("--path", type=str, default="config", help="Path to config directory")
+    args = parser.parse_args()
+    lint_directory(args.path)
diff --git a/tests/test_lint_configs.py b/tests/test_lint_configs.py
@@ -0,0 +1,17 @@
+import pytest
+import subprocess
+import os
+
+def test_lint_success():
+    # Tests a valid directory (the default 'config' folder)
+    result = subprocess.run(["python", "scripts/lint_configs.py", "--path", "config"], capture_output=True, text=True)
+    assert result.returncode == 0
+    assert "Success" in result.stdout
+
+def test_invalid_path():
+    # Tests a non-existent directory
+    result = subprocess.run(["python", "scripts/lint_configs.py", "--path", "does-not-exist"], capture_output=True, text=True)
+    assert result.returncode == 1
+    assert "Error: The path" in result.stdout
+
+# You can add more complex tests here later, but this covers the 'fail fast' requirement!