Skip to content

Conversation

@dphuang2
Copy link
Collaborator

@dphuang2 dphuang2 commented Aug 7, 2025

No description provided.

dphuang2 added 30 commits August 6, 2025 23:43
- Introduced a new module `get_pep440_version.py` to generate PEP 440 compliant version strings based on git information, caching results to minimize repeated calls.
- Updated `EvalMetadata` in `models.py` to use the new versioning function for the version field, replacing the previous method of using commit hashes.
- Removed dependency on `versioneer` in tests, streamlining version retrieval for evaluations.
- Added logging functionality using `default_logger` to track processed messages in `default_single_turn_rollout_processor`.
- Updated the return structure to include the modified row with messages instead of creating a new `EvaluationRow` instance.
- Ensured dataset is returned as a list after processing all rows concurrently.
- Included `pytest>=6.0.0` in the main dependencies section to ensure compatibility with testing requirements.
- Removed `pytest>=6.0.0` from the dev dependencies to streamline the development environment.
…rotocol directories

- Introduced `find_eval_protocol_dir` and `find_eval_protocol_datasets_dir` functions to streamline the discovery and creation of the `.eval_protocol` and its `datasets` subdirectory.
- Updated `LocalFSDatasetLoggerAdapter` to utilize these new utility functions, simplifying the initialization process for logging directories.
- Introduced a new optional field `pid` in the `EvaluationRow` model to store the process ID of the evaluation creator. This addition aids the evaluation watcher in detecting stopped evaluations.
…r component

- Extended the `status` enum in `eval-protocol.ts` to include a new 'stopped' state, enhancing the evaluation status tracking.
- Updated the `StatusIndicator` component to handle the new 'stopped' status, providing appropriate visual feedback with updated colors and text.
- Changed the revision number from 3 to 2.
- Added `pytest` to the main dependencies section.
- Removed `pytest` from the dev dependencies while retaining its version specification.
- Introduced a new optional field `pid` in the `EvaluationRowSchema` to store the process ID of the evaluation creator. This enhancement supports the evaluation watcher in detecting stopped evaluations, improving overall tracking and management of evaluation processes.
- Updated the `load_jsonl` function to include error handling for JSON parsing, logging the line number of any errors encountered.
- Modified the `status` field in `EvalMetadata` to be optional, allowing for more flexible evaluation states.
- Improved the `LocalFSDatasetLoggerAdapter` to check for existing rows across multiple JSONL files before appending new entries, ensuring no duplicates are logged.
- Increased the `word_count` parameter in the `generate_id` function to 5 for more diverse ID generation.
- Introduced a new `eval_watcher.py` script to monitor evaluation processes, updating their status if the associated process has terminated.
- Replaced print statements with structured logging using the `get_logger` utility for improved log management and consistency.
- Enhanced error handling and status updates within the evaluation watcher, ensuring better tracking of evaluation processes and clearer output during execution.
- Introduced a new module `logging_utils.py` to provide centralized logging configuration and utilities.
- Implemented functions for setting up loggers, logging evaluation events, performance metrics, and errors with context.
- Enhanced logging consistency across the package by utilizing structured logging practices.
- Updated the `read` method to ensure that no duplicate row IDs are logged when reading from JSONL files in the datasets directory. This improvement enhances data integrity and consistency in the evaluation logging process.
- Introduced a new module `singleton_lock.py` that implements file-based singleton lock management to ensure only one instance of a process can run at a time.
- Added functions for acquiring, releasing, and checking the status of locks, along with mechanisms for handling stale locks.
- Implemented tests in `test_singleton_lock.py` and `test_singleton_lock_multiprocessing.py` to validate the lock behavior under various scenarios, including concurrent access and cleanup of stale locks.
- Added regex-based extraction of "row_id" to provide more context in error messages when JSON parsing fails. This improvement aids in debugging by including the problematic row ID in the raised ValueError.
- Moved the call to `ensure_singleton_watcher()` into the `evaluation_test` function to ensure the evaluation watcher is running before processing begins. This change enhances the reliability of the evaluation process by ensuring the watcher is active during execution.
- Added an ignore rule for `tests/test_eval_watcher.py` in the coverage command to streamline coverage reporting and focus on relevant tests.
- Implemented a signal handler to automatically reap zombie child processes, preventing accumulation and potential resource leaks.
- Enhanced process management by setting up the signal handler for SIGCHLD if available, ensuring better stability during evaluation execution.
…etLoggerAdapter

- Updated `is_process_running` to include a timeout parameter, allowing for more flexible process monitoring.
- Implemented file locking mechanisms in `LocalFSDatasetLoggerAdapter` to prevent race conditions during logging operations, ensuring data integrity when multiple processes access log files.
- Added methods for acquiring and releasing file locks, improving the robustness of the logging process.
…l files only, preventing unnecessary updates for .lock files.
…casted, preventing unnecessary updates for .lock files.
@dphuang2
Copy link
Collaborator Author

dphuang2 commented Aug 7, 2025

Fixes #30

@dphuang2 dphuang2 merged commit 3700834 into main Aug 7, 2025
1 check passed
@dphuang2 dphuang2 deleted the fix-onboarding-flow branch August 7, 2025 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants