-
Notifications
You must be signed in to change notification settings - Fork 0
Re-branching after screwing up last PR. #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
030b089
20f8666
d5fb1a9
3a4a249
56d08e2
1abce3b
26dcf83
6738336
7062227
960a7ed
e1384aa
9f69a1c
4912a06
a048755
2b78c27
1fb1a7c
8037b27
ff9a994
fcb46ba
26ea85c
0fbdcc6
606875a
e6fc28a
fd80b6d
95b4304
eaedb1d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1 @@ | ||
| *.zip filter=lfs diff=lfs merge=lfs -text | ||
| *.csv filter=lfs diff=lfs merge=lfs -text | ||
| *.parquet filter=lfs diff=lfs merge=lfs -text |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,41 @@ | ||||||
| name: CI | ||||||
|
|
||||||
| on: | ||||||
| push: | ||||||
| pull_request: | ||||||
|
|
||||||
| jobs: | ||||||
| test: | ||||||
| runs-on: ubuntu-latest | ||||||
|
|
||||||
| steps: | ||||||
| - uses: actions/checkout@v4 | ||||||
|
|
||||||
| - name: Set up Python | ||||||
| uses: actions/setup-python@v5 | ||||||
| with: | ||||||
| python-version: "3.12" | ||||||
|
|
||||||
| - name: Install uv and dependencies | ||||||
| run: | | ||||||
| pip install uv | ||||||
| uv sync | ||||||
| uv pip install pytest-cov pytest-xdist | ||||||
|
|
||||||
| - name: Run tests with coverage | ||||||
| run: uv run pytest app/tests/ --cov --cov-branch --cov-report=xml --junitxml=junit.xml -o junit_family=legacy | ||||||
| env: | ||||||
| PYTHONPATH: . | ||||||
|
|
||||||
| - name: Upload coverage reports to Codecov | ||||||
| uses: codecov/codecov-action@v5 | ||||||
| with: | ||||||
| token: ${{ secrets.CODECOV_TOKEN }} | ||||||
| slug: Abstract-Data/campaignfinance-2023 | ||||||
|
|
||||||
| - name: Upload test results to Codecov | ||||||
| if: ${{ !cancelled() }} | ||||||
| uses: codecov/test-results-action@v1 | ||||||
| with: | ||||||
| token: ${{ secrets.CODECOV_TOKEN }} | ||||||
|
|
||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Remove trailing blank line. YAMLlint flags the trailing blank line after line 34. Apply this cleanup: slug: Abstract-Data/campaignfinance-2023
-📝 Committable suggestion
Suggested change
🧰 Tools🪛 YAMLlint (1.37.1)[error] 35-35: too many blank lines (1 > 0) (empty-lines) 🤖 Prompt for AI Agents |
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,8 +4,10 @@ | |
| *.zip | ||
| *.DS_Store | ||
| *.env | ||
| # *.parquet | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion | 🟠 Major Consolidate redundant commented and uncommented ignore patterns. Lines 7 and 10 contain commented versions of patterns, while lines 178–179 contain their uncommented equivalents. This duplication creates maintenance burden. Consider removing the commented versions or consolidating at the end of the file with a single definitive block. Also applies to: 10-10, 178-179 🤖 Prompt for AI Agents |
||
| *.__pycache__/ | ||
| /logs/ | ||
| # /tmp/ | ||
|
|
||
| # Byte-compiled / optimized / DLL files | ||
| __pycache__/ | ||
|
|
@@ -173,3 +175,5 @@ cython_debug/ | |
|
|
||
| *.csv | ||
| *.txt | ||
|
Comment on lines
176
to
177
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: wc -l .gitignoreRepository: Abstract-Data/campaignfinance-2023 Length of output: 92 🏁 Script executed: head -10 .gitignore | cat -nRepository: Abstract-Data/campaignfinance-2023 Length of output: 244 🏁 Script executed: sed -n '170,180p' .gitignore | cat -nRepository: Abstract-Data/campaignfinance-2023 Length of output: 543 🏁 Script executed: # Check if *.csv appears multiple times
rg "^\*\.csv$" .gitignoreRepository: Abstract-Data/campaignfinance-2023 Length of output: 89 🏁 Script executed: # Check if *.txt already exists elsewhere
rg "^\*\.txt$" .gitignoreRepository: Abstract-Data/campaignfinance-2023 Length of output: 83 Remove duplicate CSV rule and clarify intent of broad Line 176 duplicates the Clarify the intent: Is 🤖 Prompt for AI Agents |
||
| *.parquet | ||
| /tmp | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,18 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Texas Campaign Finance Package | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ## Overview | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ## Examples | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Placeholder text needs to be replaced. The text 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ## Ability to Download TEC File Data Built-In | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ## Dependencies | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+1
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Minor markdown formatting improvements. Add blank lines after headings and ensure file ends with a newline for proper markdown rendering and linting compliance. # Texas Campaign Finance Package
## Overview
+
This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.
It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.
## Examples
+
Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.
## Ability to Download TEC File Data Built-In
+
Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.
## Dependencies
+




+📝 Committable suggestion
Suggested change
🧰 Tools🪛 LanguageTool[style] ~15-~15: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 1026 characters long) (EN_EXCESSIVE_EXCLAMATION) 🪛 markdownlint-cli2 (0.18.1)3-3: Headings should be surrounded by blank lines (MD022, blanks-around-headings) 7-7: Headings should be surrounded by blank lines (MD022, blanks-around-headings) 10-10: Headings should be surrounded by blank lines (MD022, blanks-around-headings) 14-14: Headings should be surrounded by blank lines (MD022, blanks-around-headings) 18-18: Files should end with a single newline character (MD047, single-trailing-newline) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,7 @@ | ||
| # from abcs.abc_download import FileDownloader | ||
| from abcs.abc_state_config import CSVReaderConfig, StateConfig, CategoryConfig, CategoryTypes | ||
| from abcs.abc_category import StateCategoryClass | ||
| from abcs.abc_validation import StateFileValidation | ||
| from abcs.abc_validation_errors import ValidationErrorList | ||
| from abcs.abc_db_loader import DBLoaderClass | ||
| from abcs.abc_download import FileDownloaderABC, RecordGen | ||
| from app.abcs.abc_state_config import CSVReaderConfig, StateConfig, CategoryConfig, CategoryTypes | ||
| from app.abcs.abc_category import StateCategoryClass | ||
| from app.abcs.abc_validation import StateFileValidation | ||
| from app.abcs.abc_validation_errors import ValidationErrorList | ||
| from app.abcs.abc_db_loader import DBLoaderClass | ||
| from app.abcs.abc_download import FileDownloaderABC, RecordGen, progress |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,29 +2,41 @@ | |||||||||||||||||||||||
| from dataclasses import dataclass, field | ||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||
| import abc | ||||||||||||||||||||||||
| from abcs.abc_state_config import StateConfig, CategoryTypes | ||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||
| from typing import Optional, Generator, Dict, Annotated | ||||||||||||||||||||||||
| from typing import Optional, Generator, Dict, Annotated, ClassVar | ||||||||||||||||||||||||
| from icecream import ic | ||||||||||||||||||||||||
| from pydantic import Field as PydanticField | ||||||||||||||||||||||||
| import itertools | ||||||||||||||||||||||||
| from datetime import datetime | ||||||||||||||||||||||||
| import polars as pl | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| from app.abcs.abc_state_config import StateConfig, CategoryTypes | ||||||||||||||||||||||||
| from web_scrape_utils import CreateWebDriver | ||||||||||||||||||||||||
| from app.live_display import ProgressTracker | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| RecordGen = Annotated[Optional[Generator[Dict, None, None]], PydanticField(default=None)] | ||||||||||||||||||||||||
| FilteredRecordGen = RecordGen | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| progress = ProgressTracker() | ||||||||||||||||||||||||
| progress.start() | ||||||||||||||||||||||||
|
Comment on lines
+20
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Side effect at module import time. Starting Consider lazy initialization or explicit start: progress = ProgressTracker()
-progress.start()Then call
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @dataclass | ||||||||||||||||||||||||
| class FileDownloaderABC(abc.ABC): | ||||||||||||||||||||||||
| config: StateConfig | ||||||||||||||||||||||||
| driver: ClassVar[Optional[CreateWebDriver]] = None | ||||||||||||||||||||||||
| folder: Path = field(init=False) | ||||||||||||||||||||||||
| data: RecordGen | CategoryTypes = None | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def __post_init__(self): | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| self.check_if_folder_exists() | ||||||||||||||||||||||||
| self.folder = self.config.TEMP_FOLDER | ||||||||||||||||||||||||
| FileDownloaderABC.driver = CreateWebDriver(download_folder=self.folder) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| def not_headless(cls): | ||||||||||||||||||||||||
| if cls.driver: | ||||||||||||||||||||||||
| cls.driver.headless = False | ||||||||||||||||||||||||
| return cls | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def check_if_folder_exists(self) -> Path: | ||||||||||||||||||||||||
| _temp_folder_name = self.config.TEMP_FOLDER.stem.title() | ||||||||||||||||||||||||
|
|
@@ -46,12 +58,65 @@ def check_if_folder_exists(self) -> Path: | |||||||||||||||||||||||
| ic("User selected 'n'. Exiting...") | ||||||||||||||||||||||||
| sys.exit() | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| def extract_zipfile(cls, zip_ref, tmp): | ||||||||||||||||||||||||
| zip_file_info = zip_ref.infolist() | ||||||||||||||||||||||||
| _extract_task = progress.add_task("T4", "Extract Zip", "In Progress") | ||||||||||||||||||||||||
| for file in zip_file_info: | ||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||
| cls._process_csv(zip_ref, file, tmp) | ||||||||||||||||||||||||
| except Exception as e: | ||||||||||||||||||||||||
| ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}") | ||||||||||||||||||||||||
|
Comment on lines
+66
to
+69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Broad exception catch may hide critical errors. Catching bare try:
cls._process_csv(zip_ref, file, tmp)
- except Exception as e:
- ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
+ except (OSError, IOError, pl.exceptions.ComputeError) as e:
+ ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
+ raise📝 Committable suggestion
Suggested change
🧰 Tools🪛 Ruff (0.14.7)68-68: Do not catch blind exception: (BLE001) |
||||||||||||||||||||||||
| progress.update_task(_extract_task, "Complete") | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| def _process_csv(cls, zip_ref, file, tmp): | ||||||||||||||||||||||||
| file_name = Path(file.filename) | ||||||||||||||||||||||||
| if file_name.suffix not in ('.csv', '.txt'): | ||||||||||||||||||||||||
| ic(f"File {file_name.stem} is not a CSV/TXT file. Skipping...") | ||||||||||||||||||||||||
| return | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| _csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started") | ||||||||||||||||||||||||
| zip_ref.extract(file, tmp) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| if file_name.suffix == '.txt': | ||||||||||||||||||||||||
| return | ||||||||||||||||||||||||
|
Comment on lines
+79
to
+83
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Progress task not completed for When the file is a if file_name.suffix == '.txt':
+ progress.update_task(_csv_task, "Complete")
return📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl" | ||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial
Using naive datetime for file naming can cause inconsistencies across different environments or if the system timezone changes. +from datetime import datetime, timezone
...
- rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"
+ rename = f"{file_name.stem}_{datetime.now(timezone.utc):%Y%m%d}dl"📝 Committable suggestion
Suggested change
🧰 Tools🪛 Ruff (0.14.7)85-85: (DTZ005) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||
| pl_file = pl.scan_csv(tmp / file_name, low_memory=False, infer_schema=False) | ||||||||||||||||||||||||
| pl_file = ( | ||||||||||||||||||||||||
| pl_file | ||||||||||||||||||||||||
| .with_columns( | ||||||||||||||||||||||||
| pl.lit(file_name.stem) | ||||||||||||||||||||||||
| .alias('file_origin') | ||||||||||||||||||||||||
| )) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| pl_file = ( | ||||||||||||||||||||||||
| pl_file | ||||||||||||||||||||||||
| .with_columns([ | ||||||||||||||||||||||||
| pl.col(col) | ||||||||||||||||||||||||
| .cast(pl.String) | ||||||||||||||||||||||||
| for col in pl_file.collect_schema().names() | ||||||||||||||||||||||||
| ])) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| pl_file.collect().write_parquet(tmp / f"{rename}.parquet", compression='lz4') | ||||||||||||||||||||||||
| progress.update_task(_csv_task, "Complete") | ||||||||||||||||||||||||
| # Clean up original CSV file | ||||||||||||||||||||||||
| (tmp / file_name).unlink() | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| @abc.abstractmethod | ||||||||||||||||||||||||
| def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC: | ||||||||||||||||||||||||
| ... | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| @abc.abstractmethod | ||||||||||||||||||||||||
| def download(self, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC: | ||||||||||||||||||||||||
| def consolidate_files(cls): | ||||||||||||||||||||||||
| ... | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @classmethod | ||||||||||||||||||||||||
| @abc.abstractmethod | ||||||||||||||||||||||||
| def read(self): | ||||||||||||||||||||||||
| def read(cls): | ||||||||||||||||||||||||
| ... | ||||||||||||||||||||||||
|
Comment on lines
+107
to
120
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Consider keyword-only arguments for boolean parameters. The @classmethod
@abc.abstractmethod
- def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
+ def download(cls, *, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
...🧰 Tools🪛 Ruff (0.14.7)109-109: Boolean-typed positional argument in function definition (FBT001) 109-109: Boolean-typed positional argument in function definition (FBT001) 114-114: Missing return type annotation for classmethod (ANN206) 119-119: Missing return type annotation for classmethod (ANN206) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def sort_categories(self) -> CategoryTypes: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| from funcs.csv_reader import FileReader | ||
| from funcs.toml_reader import read_toml | ||
| from funcs.file_exporters import write_records_to_csv_validation | ||
| from funcs.depreciated import deprecated | ||
| from .csv_reader import FileReader | ||
| from .toml_reader import read_toml | ||
| from .file_exporters import write_records_to_csv_validation | ||
| from .depreciated import deprecated | ||
|
Comment on lines
+1
to
+4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Relative imports look good, but check the module naming. The conversion to relative imports is correct. However, note that the module is named 🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider explicit trigger configuration for YAML linting.
The
push:andpull_request:events without explicit configuration are valid GitHub Actions syntax and will trigger on all push/PR events. However, YAMLlint prefers explicit values. This is a minor style note—the current syntax works fine, but if you want to satisfy stricter linting, consider using explicit empty dicts or null:📝 Committable suggestion
🧰 Tools
🪛 YAMLlint (1.37.1)
[warning] 3-3: truthy value should be one of [false, true]
(truthy)
🤖 Prompt for AI Agents