docs: remove outdated content about granite model conv_states

danbev · danbev · commit 33b65916c661 · 2025-10-18T16:55:06.000+02:00
diff --git a/notes/granite-model.md b/notes/granite-model.md
@@ -551,94 +551,4 @@ This is how local temporal context is added before the SSM computation.
 This operation is performed for each of the 3328 dimensions (rows) in parallel.
 
 
-
-
-
-
-
-
-
-
-
-
-The convolution mixes the current `xBC[t]` with the previous `xBC[t-1]`,
-`xBC[t-2]`, `xBC[t-3]`:
-```
-xBC_convolved[t] = w0 * xBC[t] + w1 * xBC[t-1] + w2 * xBC[t-2] + w3 * xBC[t-3]
-```
-Where `w0, w1, w2, w3` are learned convolution kernel weights.
-
-```
-Token[t] (1536 dims)
-    ↓
-Linear projection (learned)
-    ↓
-xBC[t] (3328 dims) ← This is what gets stored in conv_states
-    ↓
-Concatenate: [xBC[t-3], xBC[t-2], xBC[t-1], xBC[t]]
-    ↓
-1D Convolution (learned kernel, size 4)
-    ↓
-xBC_convolved[t] (3328 dims) ← Local context added!
-    ↓
-Split → x[t], B[t], C[t]
-    ↓
-SSM: h[t] = A * h[t-1] + B[t] * x[t]
-```
-
-
-```c++
-        ggml_tensor * conv = build_rs(inp, conv_states_all, hparams.n_embd_r(), n_seqs);
-```
-The **3328-dimensional xBC vector** is a **projected representation** of the
-token that will be used for the SSM computation. It's NOT the raw embedding!
-
-This projected vector contains:
-- Information about the input transformed into a higher-dimensional space
-- It's **learned** - the projection weights are trained
-- It's designed to be optimal for the subsequent convolution and SSM operations
-
-
-
-
-
-The first tensor is used with build recurrrent state:
-```c++
-        ggml_tensor * conv = build_rs(inp, conv_states_all, hparams.n_embd_r(), n_seqs);
-```
-```c++
-ggml_tensor * llm_graph_context::build_rs(
-        ggml_tensor * s,
-        ggml_tensor * state_copy_main,
-        ggml_tensor * state_copy_extra,
-            int32_t   state_size,
-            int32_t   n_seqs,
-           uint32_t   n_rs,
-           uint32_t   rs_head,
-           uint32_t   rs_size,
-            int32_t   rs_zero,
-        const llm_graph_get_rows_fn & get_state_rows) const {
-
-    ggml_tensor * states = ggml_reshape_2d(ctx0, s, state_size, rs_size);
-
-    // Clear a single state which will then be copied to the other cleared states.
-    // Note that this is a no-op when the view is zero-sized.
-    ggml_tensor * state_zero = ggml_view_1d(ctx0, states, state_size*(rs_zero >= 0), rs_zero*states->nb[1]*(rs_zero >= 0));
-    ggml_build_forward_expand(gf, ggml_scale_inplace(ctx0, state_zero, 0));
-
-    // copy states
-    // NOTE: assuming the copy destinations are ALL contained between rs_head and rs_head + n_rs
-    // {state_size, rs_size} -> {state_size, n_seqs}
-    ggml_tensor * output_states = get_state_rows(ctx0, states, state_copy_main);
-    ggml_build_forward_expand(gf, output_states);
-
-    // copy extra states which won't be changed further (between n_seqs and n_rs)
-    ggml_tensor * states_extra = ggml_get_rows(ctx0, states, state_copy_extra);
-    ggml_build_forward_expand(gf,
-        ggml_cpy(ctx0,
-            states_extra,
-            ggml_view_1d(ctx0, s, state_size*(n_rs - n_seqs), (rs_head + n_seqs)*state_size*ggml_element_size(s))));
-
-    return output_states;
-}
-```
+_wip_