Commit 3700834
authored
Fix onboarding flow (#35)
* don't just log and continue anymore—user should know
* Add PEP 440 versioning support
- Introduced a new module `get_pep440_version.py` to generate PEP 440 compliant version strings based on git information, caching results to minimize repeated calls.
- Updated `EvalMetadata` in `models.py` to use the new versioning function for the version field, replacing the previous method of using commit hashes.
- Removed dependency on `versioneer` in tests, streamlining version retrieval for evaluations.
* Enhance default_single_turn_rollout_processor to log messages
- Added logging functionality using `default_logger` to track processed messages in `default_single_turn_rollout_processor`.
- Updated the return structure to include the modified row with messages instead of creating a new `EvaluationRow` instance.
- Ensured dataset is returned as a list after processing all rows concurrently.
* Add pytest as a dependency in pyproject.toml
- Included `pytest>=6.0.0` in the main dependencies section to ensure compatibility with testing requirements.
- Removed `pytest>=6.0.0` from the dev dependencies to streamline the development environment.
* Remove unused imports in utils.py to clean up the codebase.
* Add directory utility functions for finding and creating evaluation protocol directories
- Introduced `find_eval_protocol_dir` and `find_eval_protocol_datasets_dir` functions to streamline the discovery and creation of the `.eval_protocol` and its `datasets` subdirectory.
- Updated `LocalFSDatasetLoggerAdapter` to utilize these new utility functions, simplifying the initialization process for logging directories.
* Add PID field to EvaluationRow model
- Introduced a new optional field `pid` in the `EvaluationRow` model to store the process ID of the evaluation creator. This addition aids the evaluation watcher in detecting stopped evaluations.
* Add 'stopped' status to evaluation protocol and update StatusIndicator component
- Extended the `status` enum in `eval-protocol.ts` to include a new 'stopped' state, enhancing the evaluation status tracking.
- Updated the `StatusIndicator` component to handle the new 'stopped' status, providing appropriate visual feedback with updated colors and text.
* Update uv.lock to modify pytest dependency and revision number
- Changed the revision number from 3 to 2.
- Added `pytest` to the main dependencies section.
- Removed `pytest` from the dev dependencies while retaining its version specification.
* Ensure evaluation watcher is running at the start of evaluation tests
* Add optional PID field to EvaluationRowSchema
- Introduced a new optional field `pid` in the `EvaluationRowSchema` to store the process ID of the evaluation creator. This enhancement supports the evaluation watcher in detecting stopped evaluations, improving overall tracking and management of evaluation processes.
* Enhance evaluation logging and error handling
- Updated the `load_jsonl` function to include error handling for JSON parsing, logging the line number of any errors encountered.
- Modified the `status` field in `EvalMetadata` to be optional, allowing for more flexible evaluation states.
- Improved the `LocalFSDatasetLoggerAdapter` to check for existing rows across multiple JSONL files before appending new entries, ensuring no duplicates are logged.
- Increased the `word_count` parameter in the `generate_id` function to 5 for more diverse ID generation.
- Introduced a new `eval_watcher.py` script to monitor evaluation processes, updating their status if the associated process has terminated.
* Refactor eval_watcher.py to use structured logging
- Replaced print statements with structured logging using the `get_logger` utility for improved log management and consistency.
- Enhanced error handling and status updates within the evaluation watcher, ensuring better tracking of evaluation processes and clearer output during execution.
* Add logging utilities for eval_protocol package
- Introduced a new module `logging_utils.py` to provide centralized logging configuration and utilities.
- Implemented functions for setting up loggers, logging evaluation events, performance metrics, and errors with context.
- Enhanced logging consistency across the package by utilizing structured logging practices.
* Enhance LocalFSDatasetLoggerAdapter to prevent duplicate row IDs
- Updated the `read` method to ensure that no duplicate row IDs are logged when reading from JSONL files in the datasets directory. This improvement enhances data integrity and consistency in the evaluation logging process.
* Add singleton lock functionality for process management
- Introduced a new module `singleton_lock.py` that implements file-based singleton lock management to ensure only one instance of a process can run at a time.
- Added functions for acquiring, releasing, and checking the status of locks, along with mechanisms for handling stale locks.
- Implemented tests in `test_singleton_lock.py` and `test_singleton_lock_multiprocessing.py` to validate the lock behavior under various scenarios, including concurrent access and cleanup of stale locks.
* works
* Enhance JSON line error handling in load_jsonl function
- Added regex-based extraction of "row_id" to provide more context in error messages when JSON parsing fails. This improvement aids in debugging by including the problematic row ID in the raised ValueError.
* works!
* Refactor evaluation_test.py to ensure singleton watcher is initialized
- Moved the call to `ensure_singleton_watcher()` into the `evaluation_test` function to ensure the evaluation watcher is running before processing begins. This change enhances the reliability of the evaluation process by ensuring the watcher is active during execution.
* Update CI workflow to ignore test_eval_watcher.py in coverage reports
- Added an ignore rule for `tests/test_eval_watcher.py` in the coverage command to streamline coverage reporting and focus on relevant tests.
* Add signal handler to manage zombie processes in eval_watcher.py
- Implemented a signal handler to automatically reap zombie child processes, preventing accumulation and potential resource leaks.
- Enhanced process management by setting up the signal handler for SIGCHLD if available, ensuring better stability during evaluation execution.
* move
* Enhance singleton lock functionality and file locking in LocalFSDatasetLoggerAdapter
- Updated `is_process_running` to include a timeout parameter, allowing for more flexible process monitoring.
- Implemented file locking mechanisms in `LocalFSDatasetLoggerAdapter` to prevent race conditions during logging operations, ensuring data integrity when multiple processes access log files.
- Added methods for acquiring and releasing file locks, improving the robustness of the logging process.
* build
* Add script alias for eval_protocol CLI in pyproject.toml
* Fix import path for braintrust adapters in eval_protocol module
* Update broadcast_file_update method to restrict broadcasting to .jsonl files only, preventing unnecessary updates for .lock files.
* Fix broadcast_file_update logic to ensure only .jsonl files are broadcasted, preventing unnecessary updates for .lock files.
* remove a bunch of stuff
* remove ignore test that doesn't exist1 parent 7b252d3 commit 3700834
File tree
21 files changed
+482
-117
lines changed- eval_protocol
- dataset_logger
- human_id
- pytest
- utils
- vite-app
- dist
- assets
- src
- components
- types
21 files changed
+482
-117
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | | - | |
20 | | - | |
21 | 20 | | |
| 21 | + | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | 5 | | |
8 | 6 | | |
9 | 7 | | |
| |||
14 | 12 | | |
15 | 13 | | |
16 | 14 | | |
17 | | - | |
18 | | - | |
| 15 | + | |
19 | 16 | | |
20 | 17 | | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
36 | 30 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
Lines changed: 42 additions & 58 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | 1 | | |
3 | 2 | | |
4 | | - | |
5 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
16 | | - | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | 21 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
| 22 | + | |
| 23 | + | |
49 | 24 | | |
50 | 25 | | |
51 | 26 | | |
| |||
68 | 43 | | |
69 | 44 | | |
70 | 45 | | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
90 | 69 | | |
91 | 70 | | |
92 | 71 | | |
93 | 72 | | |
94 | | - | |
| 73 | + | |
| 74 | + | |
95 | 75 | | |
96 | 76 | | |
97 | 77 | | |
98 | 78 | | |
99 | 79 | | |
100 | 80 | | |
| 81 | + | |
101 | 82 | | |
102 | 83 | | |
103 | 84 | | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
109 | 93 | | |
110 | 94 | | |
111 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
0 commit comments