-
Notifications
You must be signed in to change notification settings - Fork 80
Add SweRank support #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add SweRank support #323
Conversation
dd9239f to
08ff53f
Compare
|
|
||
| # Set max document length based on rerank type | ||
| if self._rerank_type == "code": | ||
| max_doc_length = 1024 # Longer for code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this numbers come from?
We normally start with the user provided context size / window size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from the original SweRank code but yes I do think it might make more sense to just use the user provided size here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this new file needed? Looks like a copy paste of https://github.com/castorini/rank_llm/blob/main/src/rank_llm/rerank/listwise/rank_listwise_os_llm.py for the most part.
Please either modify that to handle code reranking (preferred) or if you have to create a new class inherit from RankListwiseOSLLM to avoid some of the copy pastes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, working on that right now. PR is still a work in progress, I'll mark it ready as review once the code is working and in a reasonable state haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between this and rank_zephyr_template? why introduce a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah didn't notice those were the same, will remove!
src/rank_llm/scripts/run_rank_llm.py
Outdated
| choices=["text", "code"], | ||
| help="Type of reranking: 'text' for standard passages, 'code' for GitHub issues (SweRank models only)", | ||
| ) | ||
| parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? we are deprecating the prompt types, using prompt templates should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
| min_doc_length = 300 | ||
| else: | ||
| max_doc_length = 300 # Standard for text | ||
| min_doc_length = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does min doc length matter?
src/rank_llm/rerank/reranker.py
Outdated
|
|
||
| keys_and_defaults = [ | ||
| ("context_size", 4096), | ||
| ("prompt_template_path", None), # Will be set based on rerank_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other way around, you need to set the prompt_template_path, rerank_type should not be needed
|
@Raptors65 thank you for working on this. My understanding is that you only need Please take a look at #313 as an example. Happy to chat more if things are not clear or if you think I am missing some context. |
|
@sahel-sh thanks for the example PR, that's super helpful! I agree that most of the code here right now isn't necessary, still definitely a WIP; I'll get this cleaned up by end of day. Thanks for the comments! |
08ff53f to
3d57870
Compare
Pull Request Checklist
Reference Issue
Please provide the reference to issue this PR is addressing (# followed by the issue number). If there is no associated issue, write "N/A".
ref: N/A
Adds SweRank support
still a work in progress (a lot of this code is messy cursor-generated code), will put up a non-draft PR once that's cleaned up
Checklist Items
Before submitting your pull request, please review these items:
PR Type
What kind of change does this PR introduce?