Fix onboarding flow #35

dphuang2 · 2025-08-07T20:16:00Z

No description provided.

- Introduced a new module `get_pep440_version.py` to generate PEP 440 compliant version strings based on git information, caching results to minimize repeated calls. - Updated `EvalMetadata` in `models.py` to use the new versioning function for the version field, replacing the previous method of using commit hashes. - Removed dependency on `versioneer` in tests, streamlining version retrieval for evaluations.

- Added logging functionality using `default_logger` to track processed messages in `default_single_turn_rollout_processor`. - Updated the return structure to include the modified row with messages instead of creating a new `EvaluationRow` instance. - Ensured dataset is returned as a list after processing all rows concurrently.

- Included `pytest>=6.0.0` in the main dependencies section to ensure compatibility with testing requirements. - Removed `pytest>=6.0.0` from the dev dependencies to streamline the development environment.

…rotocol directories - Introduced `find_eval_protocol_dir` and `find_eval_protocol_datasets_dir` functions to streamline the discovery and creation of the `.eval_protocol` and its `datasets` subdirectory. - Updated `LocalFSDatasetLoggerAdapter` to utilize these new utility functions, simplifying the initialization process for logging directories.

- Introduced a new optional field `pid` in the `EvaluationRow` model to store the process ID of the evaluation creator. This addition aids the evaluation watcher in detecting stopped evaluations.

…r component - Extended the `status` enum in `eval-protocol.ts` to include a new 'stopped' state, enhancing the evaluation status tracking. - Updated the `StatusIndicator` component to handle the new 'stopped' status, providing appropriate visual feedback with updated colors and text.

- Changed the revision number from 3 to 2. - Added `pytest` to the main dependencies section. - Removed `pytest` from the dev dependencies while retaining its version specification.

- Introduced a new optional field `pid` in the `EvaluationRowSchema` to store the process ID of the evaluation creator. This enhancement supports the evaluation watcher in detecting stopped evaluations, improving overall tracking and management of evaluation processes.

- Updated the `load_jsonl` function to include error handling for JSON parsing, logging the line number of any errors encountered. - Modified the `status` field in `EvalMetadata` to be optional, allowing for more flexible evaluation states. - Improved the `LocalFSDatasetLoggerAdapter` to check for existing rows across multiple JSONL files before appending new entries, ensuring no duplicates are logged. - Increased the `word_count` parameter in the `generate_id` function to 5 for more diverse ID generation. - Introduced a new `eval_watcher.py` script to monitor evaluation processes, updating their status if the associated process has terminated.

- Replaced print statements with structured logging using the `get_logger` utility for improved log management and consistency. - Enhanced error handling and status updates within the evaluation watcher, ensuring better tracking of evaluation processes and clearer output during execution.

- Introduced a new module `logging_utils.py` to provide centralized logging configuration and utilities. - Implemented functions for setting up loggers, logging evaluation events, performance metrics, and errors with context. - Enhanced logging consistency across the package by utilizing structured logging practices.

- Updated the `read` method to ensure that no duplicate row IDs are logged when reading from JSONL files in the datasets directory. This improvement enhances data integrity and consistency in the evaluation logging process.

- Introduced a new module `singleton_lock.py` that implements file-based singleton lock management to ensure only one instance of a process can run at a time. - Added functions for acquiring, releasing, and checking the status of locks, along with mechanisms for handling stale locks. - Implemented tests in `test_singleton_lock.py` and `test_singleton_lock_multiprocessing.py` to validate the lock behavior under various scenarios, including concurrent access and cleanup of stale locks.

- Added regex-based extraction of "row_id" to provide more context in error messages when JSON parsing fails. This improvement aids in debugging by including the problematic row ID in the raised ValueError.

- Moved the call to `ensure_singleton_watcher()` into the `evaluation_test` function to ensure the evaluation watcher is running before processing begins. This change enhances the reliability of the evaluation process by ensuring the watcher is active during execution.

- Added an ignore rule for `tests/test_eval_watcher.py` in the coverage command to streamline coverage reporting and focus on relevant tests.

- Implemented a signal handler to automatically reap zombie child processes, preventing accumulation and potential resource leaks. - Enhanced process management by setting up the signal handler for SIGCHLD if available, ensuring better stability during evaluation execution.

…etLoggerAdapter - Updated `is_process_running` to include a timeout parameter, allowing for more flexible process monitoring. - Implemented file locking mechanisms in `LocalFSDatasetLoggerAdapter` to prevent race conditions during logging operations, ensuring data integrity when multiple processes access log files. - Added methods for acquiring and releasing file locks, improving the robustness of the logging process.

…l files only, preventing unnecessary updates for .lock files.

…casted, preventing unnecessary updates for .lock files.

dphuang2 · 2025-08-07T23:52:46Z

Fixes #30

dphuang2 added 30 commits August 6, 2025 23:43

don't just log and continue anymore—user should know

2cf2406

Add pytest as a dependency in pyproject.toml

5e2a497

- Included `pytest>=6.0.0` in the main dependencies section to ensure compatibility with testing requirements. - Removed `pytest>=6.0.0` from the dev dependencies to streamline the development environment.

Remove unused imports in utils.py to clean up the codebase.

6e22d35

Add PID field to EvaluationRow model

429bd8b

- Introduced a new optional field `pid` in the `EvaluationRow` model to store the process ID of the evaluation creator. This addition aids the evaluation watcher in detecting stopped evaluations.

Update uv.lock to modify pytest dependency and revision number

94b0e46

- Changed the revision number from 3 to 2. - Added `pytest` to the main dependencies section. - Removed `pytest` from the dev dependencies while retaining its version specification.

Ensure evaluation watcher is running at the start of evaluation tests

1fe3338

Merge branch 'main' into fix-onboarding-flow

d5090e6

works

ca126dd

Enhance JSON line error handling in load_jsonl function

b980964

- Added regex-based extraction of "row_id" to provide more context in error messages when JSON parsing fails. This improvement aids in debugging by including the problematic row ID in the raised ValueError.

works!

598c12a

Update CI workflow to ignore test_eval_watcher.py in coverage reports

dab44da

- Added an ignore rule for `tests/test_eval_watcher.py` in the coverage command to streamline coverage reporting and focus on relevant tests.

move

4ff7912

build

0fdd594

Add script alias for eval_protocol CLI in pyproject.toml

e37b078

Fix import path for braintrust adapters in eval_protocol module

698b04d

Update broadcast_file_update method to restrict broadcasting to .json…

a7b76d4

…l files only, preventing unnecessary updates for .lock files.

Fix broadcast_file_update logic to ensure only .jsonl files are broad…

a0a487f

…casted, preventing unnecessary updates for .lock files.

remove a bunch of stuff

f84360f

remove ignore test that doesn't exist

7cd374a

dphuang2 merged commit 3700834 into main Aug 7, 2025
1 check passed

dphuang2 deleted the fix-onboarding-flow branch August 7, 2025 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix onboarding flow #35

Fix onboarding flow #35

Uh oh!

dphuang2 commented Aug 7, 2025

Uh oh!

dphuang2 commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix onboarding flow #35

Fix onboarding flow #35

Uh oh!

Conversation

dphuang2 commented Aug 7, 2025

Uh oh!

dphuang2 commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants