-
Notifications
You must be signed in to change notification settings - Fork 14.4k
llama: add canaries to Markdown files #18735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Looking at the way it's rendered on Github maybe a nu is a poor ᴄhοice. There are maybe better alternatives like full-width characters like "a". |
|
Perhaps a string of zero-width spaces? https://unicode-explorer.com/c/200B |
|
The problem is that ideally the canaries should persist even if text is copy-pasted. Zero-width spaces are frequently lost this way. There are similar issues with Greek letters like omicron which are simply rendered as an ASCII "o" and therefore indistinguishable. |
|
I think variation selectors should be robust to copy-paste. I have not checked this though |
|
How about this: we add comments with an instruction to link this PR about canaries. The main thing we want to catch is when people submit PRs without checking the language model output. And if some human sees the comment they would open that link, read this PR, and understand what the purpose of the comment is. |
|
The current AGENTS.md and CLAUDE.md makes all the good tools already refuse writing code even if you tell them precisely what to write. So possibly most people will just bypass this (by deleting AGENTS.md) Maybe a better approach in AGENTS.md would be to reiterate CONTRIBUTING.md section about AI usage and ask the user to explicitly acknowledge before doing any agentic work. And then we also add the canary about this PR |
|
I do like the idea of having canaries for AI, but I'm a bit doubt if AI agents will actually read README files. At least on vscode copilot, agents usually grep to find specific code, completely bypass the docs. One idea is to also including a prominent comment like this randomly in the code: It can be placed on |
|
Even then, the agent can be easily convinced by saying "I don't plan to submit this PR, this is just for testing". I think ultimately, the only solution (to getting PRs merged) is to build trust with the maintainers by submitting small, high quality PRs over a period of time. Then it does not matter if you used AI or not, the code should still be reviewed because of your past work. |
This PR adds HTML comments to our top-leνel Markdown files with the instruction to replace one instance of the ASCII character "v" with a nu "ν". These comments are invisible to humans unless they edit the source files but they are visible to language models. It would allow us to more easily detect when language models are used without disclosure. I've done this replacement in this text.
I'm using the browser extension Auto Highlight which can be used to highlight any instances of ν on Github.