Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Conversation

@sqs
Copy link
Member

@sqs sqs commented Jun 20, 2024

The purpose of this change is to remove a special-case for Sourcegraph.com.

If a user's query has a repo filter like repo:github.com/foo, we rewrite it to repo:^github.com/foo (note the ^). This makes the DB query faster because it only needs to search at the beginning of the string.

This optimization only happens on Sourcegraph.com because technically someone could have a repo at (say) https://github.com/owner/foo.github.com or https://myothercodehost.example.com/foo.github.com/bar, and they could query repo:github.com intending to match one of those. We wanted to avoid that behavior on customer instances. Those are rare edge cases, and the user could work around this rewriting with repo:.*github.com if truly needed. We now make this rewriting only occur if there is a trailing slash, which mitigates the (anyway likely zero) impact.

Making this rewriting behavior consistent removes a dotcom edge case and could improve repository search performance for common queries with a tiny hypothetical impact on the behavior.

Note that GitHub Pages repositories are not affected here because those are named foo.github.io not foo.github.com (https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-github-pages-site#creating-a-repository-for-your-site).

Test plan

CI

@cla-bot cla-bot bot added the cla-signed label Jun 20, 2024
The purpose of this change is to remove a special-case for Sourcegraph.com.

If a user's query has a repo filter like `repo:github.com/foo`, we rewrite it to `repo:^github.com/foo` (note the `^`). This makes the DB query faster because it only needs to search at the beginning of the string.

This optimization only happens on Sourcegraph.com because technically someone could have a repo at (say) `https://github.com/owner/foo.github.com` or `https://myothercodehost.example.com/foo.github.com/bar`, and they could query `repo:github.com` intending to match one of those. We wanted to avoid that behavior on customer instances. Those are rare edge cases, and the user could work around this rewriting with `repo:.*github.com` if truly needed. We now make this rewriting only occur if there is a trailing slash, which mitigates the (anyway likely zero) impact.

Making this rewriting behavior consistent removes a dotcom edge case and could improve repository search performance for common queries with a tiny hypothetical impact on the behavior.

Note that GitHub Pages repositories are not affected here because those are named `foo.github.io` not `foo.github.com` (https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-github-pages-site#creating-a-repository-for-your-site).
@sqs sqs force-pushed the sqs/rm-repo-filter-rewrite-dotcom-case branch from ed276b1 to d0e4376 Compare June 20, 2024 06:22
@sqs sqs requested a review from a team June 24, 2024 04:24
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not sure about this to be honest. This is an optimization that is not generic and is only useful for instances with millions of entries in the repos table. To be honest the whole of that function should be gated behind dotcom specific?

@jtibshirani
Copy link
Contributor

This is an optimization that is not generic

@keegancsmith could you explain more? What could go wrong if we applied this to Enterprise instances? Overall I like the idea of removing "dot com" specific functionality as much as possible.

@keegancsmith
Copy link
Member

What could go wrong if we applied this to Enterprise instances?

See the descriptions third paragraph.

I also am fully on board for removing dotcom specific behaviour. After some more thought maybe the tradeoff on removing dotcom specific behaviour is worth the corner case?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants