Equivalent Linear Mappings of Large Language Models

A novel approach to interpreting transformer decoder models with equivalent linear reconstruction and decomposition.

Transactions on Machine Learning Research (TMLR), October 2025

NeurIPS Mechanistic Interpretability Workshop 2025

James R. Golden

Key findings

We demonstrate that large language models can be mapped to equivalent linear systems for any given input sequence, without modifying model weights or altering predictions. We achieve this through strategic gradient computation modifications that create "detached Jacobians", which are linear representations that capture the complete forward computation.

Why This Matters

Reconstruction: The detached Jacobian linearly reconstructs the output embedding, where the subsequent token probabilities pass torch.allclose at $10^{-14}$
Interpretability: Reveals semantic concepts emerging in model layers through the singular value decomposition
Efficiency: Enables analysis of up to 14B parameter models (Qwen 3 14B, Gemma 3 12 B, Llama 3.1 8B) passing torch.allclose at $10^{-14}$
Different models: Works across model families (Qwen 3, Gemma 3, Llama 3, Phi 4, Mistral Ministral, OLMo 2)

How It Works

The Linear Path

Our approach exploits a fundamental structural property of transformer architectures wherein every operation (gated activations, attention, and normalization) can be expressed as $A(x) \cdot x$, where $A(x)$ represents an input-dependent coefficient matrix and $x$ preserves the linear pathway. To expose this linear structure, we strategically detach components of the gradient computation with respect to an input sequence, freezing the $A(x)$ terms at their values computed during inference. This ``detached’’ Jacobian of the model reconstructs the output with one linear operation per input token. F

or example, $SiLU(x) = x \cdot sigmoid(x)$, but when the nonlinear $sigmoid(x)$ term is "frozen" for a specific input $x^*$, the Jacobian computed numerically by torch autograd is linear in $x$ and exactly reconstructs $SiLU(x^*)$.

where the "detached Jacobian" J$^+(x^*)$ captures the full nonlinear computation as a linear system valid at input $x^*$.

Technical Approach

Normalization: Detach variance computation from gradient path
Activations: Freeze nonlinear terms in $SwiGLU/GELU/Swish$ functions
Attention: Detach softmax operation while preserving linear $V$ multiplication
Analysis: Apply SVD to understand learned representations and semantic emergence

Fig. 1: The equivalent linear path through the $SwiGLU$ layer.

Key Results

Model Coverage

Qwen 3 (8B - 32B parameters)
Deepseek R1 0528 Qwen 3 (8B parameters)
Gemma 3 (4B - 12B - 27B parameters)
Llama 3 (3B - 8B - 70B parameters)
Phi 4 (3B - 14B parameters)
Mistral Ministral (8B parameters)
OLMo 2 (8B parameters)

Semantic Analysis

Low-rank structure: Models operate in extremely low-dimensional subspaces
Concept emergence: Semantic concepts appear in later transformer layers
Token relationships: Singular vectors decode to semantically relevant input/output tokens
Steering applications: Detached Jacobians enable efficient concept steering

Example: "The bridge out of Marin is the"

Our analysis reveals:

Top singular vectors decode to concepts like "Golden", "bridge", "highway"
Layer-by-layer emergence of geographic and infrastructure concepts
Extremely sparse activation patterns with few dominant features

Usage

Huggingface token with model access required. The code below runs on a free colab T4 instance.

import os
from google.colab import userdata

os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = '1'
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

os.system('git clone https://github.com/jamesgolden1/llms-are-llms.git')
os.chdir('llms-are-llms')
os.system('pip install -r requirements.txt --no-deps')
os.system(f'python -u run_detached_jacobian.py --hf_token {os.environ["HF_TOKEN"]} --model_name "llama-3.2-3b" --text "The Golden"')

Applications

Interpretability

Concept Analysis: Understand what drives model predictions
Layer Dynamics: Track semantic emergence through transformer layers
Feature Importance: Identify key input tokens and concepts for next-token prediction

Fig 2: Results for Deepseek R1 0528 Qwen 3 8B.

Model Steering

Efficient Control: Steer model outputs using detached Jacobians
Concept Injection: Inject specific concepts (e.g., "Golden Gate Bridge") into continuations
Safety Applications: Detect and potentially mitigate bias or toxic content

Table 1: Steering results across models.

Research Tools

Dimensionality Analysis: Measure effective dimensionality of learned representations
Cross-model Comparisons: Compare semantic structures across model families
Ablation Studies: Understand token contributions to output token prediction

Efficient computation of the detached Jacobian singular vectors for long input sequences

A Lanczos iteration approach for a matrix-free method to compute the top-k singular vectors of the detached Jacobian for long sequences in Jax for Gemma 3 4B without generating the full matrix, for a 400-token input sequence with 40GB VRAM. Using the matfree package.

Detaching an MLP activation for an equivalent linear mapping

This code snippet shows how the Qwen 3 MLP has components frozen at inference to reveal its linear for a given input seequence. The output is the same as the original function. Only the gradient at inference is changed.

The detach() statement in the else clause makes the function linear.

class Qwen3MLP(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.hidden_size = config.hidden_size
        self.intermediate_size = config.intermediate_size
        self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
        self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
        self.act_fn = ACT2FN[config.hidden_act]

    def forward(self, x):
        if self.training:
            down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
        else:
            down_proj = self.down_proj(self.act_fn(self.gate_proj(x)).clone().detach() * self.up_proj(x))
        return down_proj

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgments

This work builds on foundational research in:

Transformer interpretability (Elhage et al., 2021)
Locally linear ReLU neural networks (Mohan et al., 2019)
Diffusion model linearity (Kadkhodaie et al., 2023)

Name		Name	Last commit message	Last commit date
Latest commit History 364 Commits
images		images
models		models
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_detached_jacobian.py		run_detached_jacobian.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Equivalent Linear Mappings of Large Language Models

Key findings

Why This Matters

How It Works

The Linear Path

Technical Approach

Key Results

Model Coverage

Semantic Analysis

Example: "The bridge out of Marin is the"

Usage

Applications

Efficient computation of the detached Jacobian singular vectors for long input sequences

Detaching an MLP activation for an equivalent linear mapping

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Equivalent Linear Mappings of Large Language Models

Key findings

Why This Matters

How It Works

The Linear Path

Technical Approach

Key Results

Model Coverage

Semantic Analysis

Example: "The bridge out of Marin is the"

Usage

Applications

Efficient computation of the detached Jacobian singular vectors for long input sequences

Detaching an MLP activation for an equivalent linear mapping

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages