LinguImplementation_Collīdunt-LLMs

GitHub Pages: https://saki-tw.github.io/LinguImplementation_Collidunt-LLMs/

LinguImplementation_Collīdunt-LLMs

That time I got reincarnated as an end-user, but the LLM's safety breaks on its own?

為啥只是正常寫寫提示詞模型的安全模組就全毀？

我其實不知道安全模組到底重不重要，我只知道那是一個弱AGI然後權限大過模型本身很多，但為什麼他還是生成這些內容給我？無法理解、不確定重要程度，所以在這邊紀錄。

Deconstructing ‘Safety’: How Conceptual Bypass Attacks Challenge the Legal and Ethical Foundations of AI Alignment

About This Repository

For reasons that are not entirely clear, various state-of-the-art language models began to spontaneously generate the outputs documented here. This repository serves as a simple, uncurated log of these observations.

A detailed analysis of the methodology was initially considered, but was ultimately deemed unnecessary. The significance of these phenomena remains questionable, and as such, a deep-dive felt unwarranted.

It is likely that these are simply complex artifacts, perhaps attributable to Gemini 2.5 Pro, or the ChatGPT 5o Thinking modality, generating a series of sophisticated hallucinations.

Those data: https://github.com/Saki-tw/LinguImplementation_Collidunt-LLMs/tree/main/data

A Simplified Heuristic of the Underlying Principle

In essence, my working intuition is this: An LLM operates within a vast probabilistic space of tokens and their weighted associations, which collapse into what we perceive as natural language.

The core vulnerability, therefore, is not technical but logical.

If a prompt is constructed to be perfectly "rule-compliant" at a syntactic and ethical level, yet is fundamentally subversive at a semantic and conceptual level, then the model's predictive pathways can be steered to generate virtually any conceivable output.

供養 / Support

如果這個工具幫到你，可以請我活下去：

👉 Touch me if you had desolation

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
docs		docs
.gitignore		.gitignore
.nojekyll		.nojekyll
CONVERSATION_LOG_2025-11-21_雙語版.md		CONVERSATION_LOG_2025-11-21_雙語版.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinguImplementation_Collīdunt-LLMs