Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
030b089
Consolidating field structures of objs
jreakin Jan 14, 2025
20f8666
changed dependency manager from Poetry to UV
jreakin Jan 15, 2025
d5fb1a9
Removed files now in .gitignore
jreakin Jan 15, 2025
3a4a249
Cleaning up TX Filer Relationships TEXAS-927
jreakin Jan 15, 2025
56d08e2
Added TODO notes under texas_filers.py TEXAS-927
jreakin Jan 15, 2025
1abce3b
Attempt to fix LFS issues
jreakin Jan 17, 2025
26dcf83
Initial commit on jan25-polars
jreakin Jan 18, 2025
6738336
Recreating jan25-update-branch from 1abce3b
jreakin Jan 18, 2025
7062227
Made search query changes to texas_search.py
jreakin Jan 18, 2025
960a7ed
Modified file formatting
jreakin Jan 18, 2025
e1384aa
Moved webdriver tools to their own package
jreakin Jan 21, 2025
9f69a1c
Track parquet files with Git LFS
jreakin Jan 23, 2025
4912a06
updated gitignore
jreakin Feb 12, 2025
a048755
Normalize state and file origin references
jreakin Nov 9, 2025
2b78c27
Update funcs, states modules and remove tracked parquet files
jreakin Dec 4, 2025
1fb1a7c
Fix import paths across codebase to use app. prefix
jreakin Dec 4, 2025
8037b27
Fix consolidate_files schema mismatch and flaky test
jreakin Dec 4, 2025
ff9a994
Fix campaign name corruption bug in _get_field_value
jreakin Dec 4, 2025
fcb46ba
Add CI workflow with CodeCov integration
jreakin Dec 4, 2025
26ea85c
Fix CodeCov slug to campaignfinance-2023
jreakin Dec 4, 2025
0fbdcc6
Add JUnit XML output and test results upload to Codecov
jreakin Dec 4, 2025
606875a
Fix CI workflow: use uv sync and uv run for proper environment setup
jreakin Dec 4, 2025
e6fc28a
Add pytest-xdist for parallel test execution
jreakin Dec 4, 2025
fd80b6d
Run only app/tests/ and remove parallel overhead for small test suite
jreakin Dec 4, 2025
95b4304
Fix CI: install uv via pip for better compatibility
jreakin Dec 4, 2025
eaedb1d
Fix CI: add PYTHONPATH for app module imports
jreakin Dec 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
3 changes: 1 addition & 2 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
*.zip filter=lfs diff=lfs merge=lfs -text
*.csv filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
41 changes: 41 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: CI

on:
push:
pull_request:
Comment on lines +3 to +5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider explicit trigger configuration for YAML linting.

The push: and pull_request: events without explicit configuration are valid GitHub Actions syntax and will trigger on all push/PR events. However, YAMLlint prefers explicit values. This is a minor style note—the current syntax works fine, but if you want to satisfy stricter linting, consider using explicit empty dicts or null:

 on:
-  push:
-  pull_request:
+  push: {}
+  pull_request: {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
on:
push:
pull_request:
on:
push: {}
pull_request: {}
🧰 Tools
🪛 YAMLlint (1.37.1)

[warning] 3-3: truthy value should be one of [false, true]

(truthy)

🤖 Prompt for AI Agents
.github/workflows/ci.yml lines 3-5: the workflow triggers use bare "push:" and
"pull_request:" which are valid but fail stricter YAML lint rules; change each
to an explicit empty mapping or null (e.g., "push: {}" and "pull_request: {}" or
"push: null" and "pull_request: null") so the YAML linter accepts the syntax
while preserving the same trigger behavior.


jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install uv and dependencies
run: |
pip install uv
uv sync
uv pip install pytest-cov pytest-xdist

- name: Run tests with coverage
run: uv run pytest app/tests/ --cov --cov-branch --cov-report=xml --junitxml=junit.xml -o junit_family=legacy
env:
PYTHONPATH: .

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
slug: Abstract-Data/campaignfinance-2023

- name: Upload test results to Codecov
if: ${{ !cancelled() }}
uses: codecov/test-results-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove trailing blank line.

YAMLlint flags the trailing blank line after line 34. Apply this cleanup:

          slug: Abstract-Data/campaignfinance-2023
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
slug: Abstract-Data/campaignfinance-2023
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 35-35: too many blank lines (1 > 0)

(empty-lines)

🤖 Prompt for AI Agents
In .github/workflows/ci.yml around line 35, there is a trailing blank line
flagged by YAMLlint; remove the empty line at the end of the file so the file
ends on line 34 without an extra newline-only line, then save the file (ensure
the file still ends with a single newline character if your editor enforces it).

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
*.zip
*.DS_Store
*.env
# *.parquet
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Consolidate redundant commented and uncommented ignore patterns.

Lines 7 and 10 contain commented versions of patterns, while lines 178–179 contain their uncommented equivalents. This duplication creates maintenance burden. Consider removing the commented versions or consolidating at the end of the file with a single definitive block.

Also applies to: 10-10, 178-179

🤖 Prompt for AI Agents
.gitignore lines 7, 10 and lines 178–179: there are duplicate commented ignore
patterns (lines 7 and 10) that match the uncommented equivalents at 178–179;
remove the redundant commented entries (or move any intentionally kept comments
into a single definitive explanatory block at the end of the file) so each
ignore pattern appears only once (keep the active/uncommented patterns where
they currently are and delete the commented duplicates).

*.__pycache__/
/logs/
# /tmp/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down Expand Up @@ -173,3 +175,5 @@ cython_debug/

*.csv
*.txt
Comment on lines 176 to 177
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l .gitignore

Repository: Abstract-Data/campaignfinance-2023

Length of output: 92


🏁 Script executed:

head -10 .gitignore | cat -n

Repository: Abstract-Data/campaignfinance-2023

Length of output: 244


🏁 Script executed:

sed -n '170,180p' .gitignore | cat -n

Repository: Abstract-Data/campaignfinance-2023

Length of output: 543


🏁 Script executed:

# Check if *.csv appears multiple times
rg "^\*\.csv$" .gitignore

Repository: Abstract-Data/campaignfinance-2023

Length of output: 89


🏁 Script executed:

# Check if *.txt already exists elsewhere
rg "^\*\.txt$" .gitignore

Repository: Abstract-Data/campaignfinance-2023

Length of output: 83


Remove duplicate CSV rule and clarify intent of broad *.txt pattern.

Line 176 duplicates the *.csv rule already present at line 2. Additionally, line 177's *.txt pattern is overly broad and could inadvertently ignore important text files (README, documentation, config templates, etc.).

Clarify the intent: Is *.txt meant to ignore temporary text outputs, or is this too broad for the repository?

🤖 Prompt for AI Agents
In .gitignore around lines 176-177, remove the duplicate "*.csv" entry (it's
already present at line 2) and replace or clarify the overly broad "*.txt" rule:
either delete it if not needed, scope it to a specific directory/pattern for
temporary outputs (e.g., temp/*.txt or *.out.txt) or add explicit negations for
important files (e.g., !README.txt, !docs/*.txt) and include a short comment
above the rule explaining the intent (temporary outputs vs generated artifacts).

*.parquet
/tmp
Binary file modified .idea/.DS_Store
Binary file not shown.
3 changes: 2 additions & 1 deletion .idea/campaignfinance.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 0 additions & 28 deletions .idea/csv-editor.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/ruff.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Texas Campaign Finance Package

## Overview
This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.
It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.

## Examples
Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Placeholder text needs to be replaced.

The text [number of columns] appears to be a placeholder that should be replaced with the actual column count.

🤖 Prompt for AI Agents
In README.md around line 8, the placeholder "[number of columns]" must be
replaced with the actual column count; determine the correct number (e.g., 317
or the current accurate count) and replace the placeholder with that numeric
value, keeping the sentence grammatical and updating any surrounding punctuation
or wording as needed.


## Ability to Download TEC File Data Built-In
Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.


## Dependencies
![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)
![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)
![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)
![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)
Comment on lines +1 to +18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor markdown formatting improvements.

Add blank lines after headings and ensure file ends with a newline for proper markdown rendering and linting compliance.

 # Texas Campaign Finance Package
 
 ## Overview
+
 This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.
 It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.
 
 ## Examples
+
 Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.
 
 ## Ability to Download TEC File Data Built-In
+
 Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.
 
 
 ## Dependencies
+
 ![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)
 ![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)
 ![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)
 ![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Texas Campaign Finance Package
## Overview
This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.
It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.
## Examples
Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.
## Ability to Download TEC File Data Built-In
Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.
## Dependencies
![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)
![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)
![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)
![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)
# Texas Campaign Finance Package
## Overview
This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.
It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.
## Examples
Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.
## Ability to Download TEC File Data Built-In
Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.
## Dependencies
![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)
![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)
![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)
![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)
🧰 Tools
🪛 LanguageTool

[style] ~15-~15: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 1026 characters long)
Context: ...r-the-badge&logo=python&logoColor=blue) Pydantic Pandas ![Selenium](https://img.shields.io/badge/...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)

3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


7-7: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


10-10: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


14-14: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


18-18: Files should end with a single newline character

(MD047, single-trailing-newline)

🤖 Prompt for AI Agents
In README.md around lines 1 to 18, headings are missing blank lines after them
and the file likely does not end with a newline; add a single blank line
immediately after each Markdown heading (e.g., after "# Texas Campaign Finance
Package", "## Overview", "## Examples", "## Ability to Download TEC File Data
Built-In", and "## Dependencies") and ensure the file ends with a trailing
newline character.

Binary file modified app/.DS_Store
Binary file not shown.
12 changes: 6 additions & 6 deletions app/abcs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# from abcs.abc_download import FileDownloader
from abcs.abc_state_config import CSVReaderConfig, StateConfig, CategoryConfig, CategoryTypes
from abcs.abc_category import StateCategoryClass
from abcs.abc_validation import StateFileValidation
from abcs.abc_validation_errors import ValidationErrorList
from abcs.abc_db_loader import DBLoaderClass
from abcs.abc_download import FileDownloaderABC, RecordGen
from app.abcs.abc_state_config import CSVReaderConfig, StateConfig, CategoryConfig, CategoryTypes
from app.abcs.abc_category import StateCategoryClass
from app.abcs.abc_validation import StateFileValidation
from app.abcs.abc_validation_errors import ValidationErrorList
from app.abcs.abc_db_loader import DBLoaderClass
from app.abcs.abc_download import FileDownloaderABC, RecordGen, progress
10 changes: 5 additions & 5 deletions app/abcs/abc_category.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import sqlalchemy
import sqlmodel
import abcs.abc_validation as validation
import app.abcs.abc_validation as validation
from pathlib import Path
from typing import ClassVar, Dict, List
from dataclasses import dataclass, field
Expand All @@ -13,10 +13,10 @@
from collections import defaultdict
import datetime
# from abcs.abc_download import FileDownloader
from abcs.abc_db_loader import DBLoaderClass
from abcs.abc_state_config import StateConfig
from logger import Logger
import funcs
from app.abcs.abc_db_loader import DBLoaderClass
from app.abcs.abc_state_config import StateConfig
from app.logger import Logger
import app.funcs as funcs
import inject
from sqlmodel import SQLModel
from pydantic import BaseModel, Field, ConfigDict, model_validator, computed_field
Expand Down
2 changes: 1 addition & 1 deletion app/abcs/abc_db_loader.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from __future__ import annotations
import abc
from sqlmodel import SQLModel, Session, create_engine, select
from logger import Logger
from app.logger import Logger
from typing import Iterator, List, Type
from dataclasses import dataclass
import itertools
Expand Down
77 changes: 71 additions & 6 deletions app/abcs/abc_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,41 @@
from dataclasses import dataclass, field
from pathlib import Path
import abc
from abcs.abc_state_config import StateConfig, CategoryTypes
import sys
from typing import Optional, Generator, Dict, Annotated
from typing import Optional, Generator, Dict, Annotated, ClassVar
from icecream import ic
from pydantic import Field as PydanticField
import itertools
from datetime import datetime
import polars as pl

from app.abcs.abc_state_config import StateConfig, CategoryTypes
from web_scrape_utils import CreateWebDriver
from app.live_display import ProgressTracker

RecordGen = Annotated[Optional[Generator[Dict, None, None]], PydanticField(default=None)]
FilteredRecordGen = RecordGen


progress = ProgressTracker()
progress.start()
Comment on lines +20 to +21
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Side effect at module import time.

Starting ProgressTracker at module load (progress.start()) means the live display begins whenever this module is imported, even if no download is happening. This can cause unexpected console output and resource usage.

Consider lazy initialization or explicit start:

 progress = ProgressTracker()
-progress.start()

Then call progress.start() explicitly in the download() method when needed.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In app/abcs/abc_download.py around lines 20-21, creating and starting
ProgressTracker at module import time (progress = ProgressTracker();
progress.start()) causes an unwanted live display and resource use whenever the
module is imported; change to lazy initialization by removing progress.start()
from module scope, instantiate ProgressTracker (or call progress.start()) inside
the download() method when an actual download begins (or use a helper that
creates/starts the tracker on first use), and ensure the tracker is properly
stopped/cleaned up after the download completes or on errors.


@dataclass
class FileDownloaderABC(abc.ABC):
config: StateConfig
driver: ClassVar[Optional[CreateWebDriver]] = None
folder: Path = field(init=False)
data: RecordGen | CategoryTypes = None

def __post_init__(self):

self.check_if_folder_exists()
self.folder = self.config.TEMP_FOLDER
FileDownloaderABC.driver = CreateWebDriver(download_folder=self.folder)

@classmethod
def not_headless(cls):
if cls.driver:
cls.driver.headless = False
return cls

def check_if_folder_exists(self) -> Path:
_temp_folder_name = self.config.TEMP_FOLDER.stem.title()
Expand All @@ -46,12 +58,65 @@ def check_if_folder_exists(self) -> Path:
ic("User selected 'n'. Exiting...")
sys.exit()

@classmethod
def extract_zipfile(cls, zip_ref, tmp):
zip_file_info = zip_ref.infolist()
_extract_task = progress.add_task("T4", "Extract Zip", "In Progress")
for file in zip_file_info:
try:
cls._process_csv(zip_ref, file, tmp)
except Exception as e:
ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
Comment on lines +66 to +69
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Broad exception catch may hide critical errors.

Catching bare Exception and only logging with ic() can hide serious issues like disk full, permission errors, or corrupted archives. Consider catching specific exceptions or at least logging the full traceback.

             try:
                 cls._process_csv(zip_ref, file, tmp)
-            except Exception as e:
-                ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
+            except (OSError, IOError, pl.exceptions.ComputeError) as e:
+                ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
+                raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
cls._process_csv(zip_ref, file, tmp)
except Exception as e:
ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
try:
cls._process_csv(zip_ref, file, tmp)
except (OSError, IOError, pl.exceptions.ComputeError) as e:
ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
raise
🧰 Tools
🪛 Ruff (0.14.7)

68-68: Do not catch blind exception: Exception

(BLE001)

progress.update_task(_extract_task, "Complete")

@classmethod
def _process_csv(cls, zip_ref, file, tmp):
file_name = Path(file.filename)
if file_name.suffix not in ('.csv', '.txt'):
ic(f"File {file_name.stem} is not a CSV/TXT file. Skipping...")
return

_csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started")
zip_ref.extract(file, tmp)

if file_name.suffix == '.txt':
return
Comment on lines +79 to +83
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Progress task not completed for .txt files.

When the file is a .txt, the function returns early on line 83 without calling progress.update_task(_csv_task, "Complete"). This leaves the task in "Started" status indefinitely.

         if file_name.suffix == '.txt':
+            progress.update_task(_csv_task, "Complete")
             return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started")
zip_ref.extract(file, tmp)
if file_name.suffix == '.txt':
return
_csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started")
zip_ref.extract(file, tmp)
if file_name.suffix == '.txt':
progress.update_task(_csv_task, "Complete")
return
🤖 Prompt for AI Agents
In app/abcs/abc_download.py around lines 79 to 83, the progress task _csv_task
is left in "Started" when a .txt file causes an early return; before returning
for file_name.suffix == '.txt' call progress.update_task(_csv_task, "Complete")
(or an equivalent completion/cleanup call) so the task status is set to
"Complete" before the function returns.


rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

datetime.now() called without timezone.

Using naive datetime for file naming can cause inconsistencies across different environments or if the system timezone changes.

+from datetime import datetime, timezone
...
-        rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"
+        rename = f"{file_name.stem}_{datetime.now(timezone.utc):%Y%m%d}dl"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"
rename = f"{file_name.stem}_{datetime.now(timezone.utc):%Y%m%d}dl"
🧰 Tools
🪛 Ruff (0.14.7)

85-85: datetime.datetime.now() called without a tz argument

(DTZ005)

🤖 Prompt for AI Agents
In app/abcs/abc_download.py around line 85, the filename uses a naive datetime
(datetime.now()) which lacks timezone information; replace it with a
timezone-aware timestamp (for example datetime.now(tz=timezone.utc) or
datetime.now().astimezone()) when formatting the rename string and ensure the
appropriate timezone import (from datetime import datetime, timezone) is added
at the top of the file so the generated filename is consistently based on an
explicit timezone (e.g., UTC).

pl_file = pl.scan_csv(tmp / file_name, low_memory=False, infer_schema=False)
pl_file = (
pl_file
.with_columns(
pl.lit(file_name.stem)
.alias('file_origin')
))

pl_file = (
pl_file
.with_columns([
pl.col(col)
.cast(pl.String)
for col in pl_file.collect_schema().names()
]))

pl_file.collect().write_parquet(tmp / f"{rename}.parquet", compression='lz4')
progress.update_task(_csv_task, "Complete")
# Clean up original CSV file
(tmp / file_name).unlink()

@classmethod
@abc.abstractmethod
def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
...

@classmethod
@abc.abstractmethod
def download(self, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
def consolidate_files(cls):
...

@classmethod
@abc.abstractmethod
def read(self):
def read(cls):
...
Comment on lines +107 to 120
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider keyword-only arguments for boolean parameters.

The download method has two boolean positional parameters which can be confusing at call sites (e.g., download(True, False)).

     @classmethod
     @abc.abstractmethod
-    def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
+    def download(cls, *, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
         ...
🧰 Tools
🪛 Ruff (0.14.7)

109-109: Boolean-typed positional argument in function definition

(FBT001)


109-109: Boolean-typed positional argument in function definition

(FBT001)


114-114: Missing return type annotation for classmethod consolidate_files

(ANN206)


119-119: Missing return type annotation for classmethod read

(ANN206)

🤖 Prompt for AI Agents
In app/abcs/abc_download.py around lines 107 to 120, the download method
currently accepts two boolean positional arguments which are ambiguous at call
sites; change the abstractmethod signature to require keyword-only booleans
(e.g., introduce a positional-only marker by adding a lone * so overwrite and
read_from_temp must be passed by name), update all concrete implementations to
match the new signature, and update all call sites to use explicit keywords
(overwrite=..., read_from_temp=...) to avoid confusion; ensure type hints and
return type remain unchanged and run tests to catch any mismatches.


def sort_categories(self) -> CategoryTypes:
Expand Down
8 changes: 6 additions & 2 deletions app/abcs/abc_state_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from enum import StrEnum
from rich.progress import track

import funcs
import app.funcs as funcs


def check_for_empty_gen(func):
Expand Down Expand Up @@ -151,7 +151,11 @@ def TEMP_FOLDER(self) -> Path:

@property
def FIELD_DATA(self) -> dict:
return funcs.read_toml(Path(__file__).parents[1] / 'states'/ (_state := self.STATE_NAME.lower()) / f"{_state}_fields.toml")
return (
funcs
.read_toml(
Path(__file__)
.parents[1] / 'states'/ (_state := self.STATE_NAME.lower()) / f"{_state}_fields.toml"))

@staticmethod
@lru_cache
Expand Down
6 changes: 3 additions & 3 deletions app/abcs/abc_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
import csv
from tqdm import tqdm
from pydantic import ValidationError
from abcs.abc_validation_errors import ValidationErrorList
from funcs.validator_functions import create_record_id
from logger import Logger
from app.abcs.abc_validation_errors import ValidationErrorList
from app.funcs.validator_functions import create_record_id
from app.logger import Logger
from icecream import ic

ValidatorType = Type[SQLModel]
Expand Down
2 changes: 1 addition & 1 deletion app/abcs/abc_validation_errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import pandas as pd
from icecream import ic

ic.configureOutput(prefix='abc_validation_errors|')
ic.configureOutput(prefix='campaignfinance|')


class RecordValidationError(BaseModel):
Expand Down
8 changes: 4 additions & 4 deletions app/funcs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from funcs.csv_reader import FileReader
from funcs.toml_reader import read_toml
from funcs.file_exporters import write_records_to_csv_validation
from funcs.depreciated import deprecated
from .csv_reader import FileReader
from .toml_reader import read_toml
from .file_exporters import write_records_to_csv_validation
from .depreciated import deprecated
Comment on lines +1 to +4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Relative imports look good, but check the module naming.

The conversion to relative imports is correct. However, note that the module is named depreciated (Line 4), which appears to be a misspelling of deprecated. Consider renaming the module to deprecated.py to match standard terminology.

🤖 Prompt for AI Agents
In app/funcs/__init__.py lines 1-4, the imported module name "depreciated" is
misspelled; rename the file app/funcs/depreciated.py to app/funcs/deprecated.py
and update this import to from .deprecated import deprecated; also search and
update any other references/imports in the repo to use the new filename and
import path, run tests and linters to ensure no broken imports remain.

Loading
Loading