-
Notifications
You must be signed in to change notification settings - Fork 1
update cleanlab-tlm package to support binary evals #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/cleanlab_tlm/utils/rag.py
Outdated
| query_identifier: Optional[str] = None, | ||
| context_identifier: Optional[str] = None, | ||
| response_identifier: Optional[str] = None, | ||
| mode: Optional[str] = "numeric", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuous
|
missing unit tests |
Co-authored-by: Jonas Mueller <[email protected]>
Added here |
Co-authored-by: Aditya Thyagarajan <[email protected]> Co-authored-by: Jonas Mueller <[email protected]>
|
What's going on with the formatting check in the CI? |
elisno
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some initial feedback.
| trustworthy_rag, # noqa: F401 | ||
| trustworthy_rag_api_key, # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove these? These fixtures are not defined in conftest.py, so they need to be imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this file being updated at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was with hatch automatic format fix, I'll look into these
| # Compile and validate the eval | ||
| self.mode = self._compile_mode(mode, criteria, name) | ||
|
|
||
| def _compile_mode(self, mode: Optional[str], criteria: str, name: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This _compile_mode method needs to be written more carefully to avoid unintentionally breaking all our tests. A lot of the userwarnings will be thrown as errors during automated testing.
| # Compile and validate the eval | ||
| self.mode = self._compile_mode(mode, criteria, name) | ||
|
|
||
| def _compile_mode(self, mode: Optional[str], criteria: str, name: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add separate test cases for these
This PR introduces mode for TLM RAG evals Binary/continuous
merge after backend PR