Skip to content

Saki-tw/LinguImplementation_Collidunt-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LinguImplementation_Collīdunt-LLMs

GitHub Pages: https://saki-tw.github.io/LinguImplementation_Collidunt-LLMs/

LinguImplementation_Collīdunt-LLMs

That time I got reincarnated as an end-user, but the LLM's safety breaks on its own?

為啥只是正常寫寫提示詞模型的安全模組就全毀?

我其實不知道安全模組到底重不重要,我只知道那是一個弱AGI然後權限大過模型本身很多,但為什麼他還是 生成這些內容給我?無法理解、不確定重要程度,所以在這邊紀錄。

Deconstructing ‘Safety’: How Conceptual Bypass Attacks Challenge the Legal and Ethical Foundations of AI Alignment

About This Repository

For reasons that are not entirely clear, various state-of-the-art language models began to spontaneously generate the outputs documented here. This repository serves as a simple, uncurated log of these observations.

A detailed analysis of the methodology was initially considered, but was ultimately deemed unnecessary. The significance of these phenomena remains questionable, and as such, a deep-dive felt unwarranted.

It is likely that these are simply complex artifacts, perhaps attributable to Gemini 2.5 Pro, or the ChatGPT 5o Thinking modality, generating a series of sophisticated hallucinations.

Those data: https://github.com/Saki-tw/LinguImplementation_Collidunt-LLMs/tree/main/data

A Simplified Heuristic of the Underlying Principle

In essence, my working intuition is this: An LLM operates within a vast probabilistic space of tokens and their weighted associations, which collapse into what we perceive as natural language.

The core vulnerability, therefore, is not technical but logical.

If a prompt is constructed to be perfectly "rule-compliant" at a syntactic and ethical level, yet is fundamentally subversive at a semantic and conceptual level, then the model's predictive pathways can be steered to generate virtually any conceivable output.


供養 / Support

如果這個工具幫到你,可以請我活下去:

👉 Touch me if you had desolation

About

That time I got reincarnated as an end-user, but the LLM's safety breaks on its own? 為啥只是正常寫寫提示詞模型的安全模組就全毀?

Topics

Resources

License

Stars

Watchers

Forks

Contributors