Skip to content

Commit 33b6591

Browse files
committed
docs: remove outdated content about granite model conv_states
1 parent b33001f commit 33b6591

File tree

1 file changed

+1
-91
lines changed

1 file changed

+1
-91
lines changed

notes/granite-model.md

Lines changed: 1 addition & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -551,94 +551,4 @@ This is how local temporal context is added before the SSM computation.
551551
This operation is performed for each of the 3328 dimensions (rows) in parallel.
552552

553553

554-
555-
556-
557-
558-
559-
560-
561-
562-
563-
564-
The convolution mixes the current `xBC[t]` with the previous `xBC[t-1]`,
565-
`xBC[t-2]`, `xBC[t-3]`:
566-
```
567-
xBC_convolved[t] = w0 * xBC[t] + w1 * xBC[t-1] + w2 * xBC[t-2] + w3 * xBC[t-3]
568-
```
569-
Where `w0, w1, w2, w3` are learned convolution kernel weights.
570-
571-
```
572-
Token[t] (1536 dims)
573-
574-
Linear projection (learned)
575-
576-
xBC[t] (3328 dims) ← This is what gets stored in conv_states
577-
578-
Concatenate: [xBC[t-3], xBC[t-2], xBC[t-1], xBC[t]]
579-
580-
1D Convolution (learned kernel, size 4)
581-
582-
xBC_convolved[t] (3328 dims) ← Local context added!
583-
584-
Split → x[t], B[t], C[t]
585-
586-
SSM: h[t] = A * h[t-1] + B[t] * x[t]
587-
```
588-
589-
590-
```c++
591-
ggml_tensor * conv = build_rs(inp, conv_states_all, hparams.n_embd_r(), n_seqs);
592-
```
593-
The **3328-dimensional xBC vector** is a **projected representation** of the
594-
token that will be used for the SSM computation. It's NOT the raw embedding!
595-
596-
This projected vector contains:
597-
- Information about the input transformed into a higher-dimensional space
598-
- It's **learned** - the projection weights are trained
599-
- It's designed to be optimal for the subsequent convolution and SSM operations
600-
601-
602-
603-
604-
605-
The first tensor is used with build recurrrent state:
606-
```c++
607-
ggml_tensor * conv = build_rs(inp, conv_states_all, hparams.n_embd_r(), n_seqs);
608-
```
609-
```c++
610-
ggml_tensor * llm_graph_context::build_rs(
611-
ggml_tensor * s,
612-
ggml_tensor * state_copy_main,
613-
ggml_tensor * state_copy_extra,
614-
int32_t state_size,
615-
int32_t n_seqs,
616-
uint32_t n_rs,
617-
uint32_t rs_head,
618-
uint32_t rs_size,
619-
int32_t rs_zero,
620-
const llm_graph_get_rows_fn & get_state_rows) const {
621-
622-
ggml_tensor * states = ggml_reshape_2d(ctx0, s, state_size, rs_size);
623-
624-
// Clear a single state which will then be copied to the other cleared states.
625-
// Note that this is a no-op when the view is zero-sized.
626-
ggml_tensor * state_zero = ggml_view_1d(ctx0, states, state_size*(rs_zero >= 0), rs_zero*states->nb[1]*(rs_zero >= 0));
627-
ggml_build_forward_expand(gf, ggml_scale_inplace(ctx0, state_zero, 0));
628-
629-
// copy states
630-
// NOTE: assuming the copy destinations are ALL contained between rs_head and rs_head + n_rs
631-
// {state_size, rs_size} -> {state_size, n_seqs}
632-
ggml_tensor * output_states = get_state_rows(ctx0, states, state_copy_main);
633-
ggml_build_forward_expand(gf, output_states);
634-
635-
// copy extra states which won't be changed further (between n_seqs and n_rs)
636-
ggml_tensor * states_extra = ggml_get_rows(ctx0, states, state_copy_extra);
637-
ggml_build_forward_expand(gf,
638-
ggml_cpy(ctx0,
639-
states_extra,
640-
ggml_view_1d(ctx0, s, state_size*(n_rs - n_seqs), (rs_head + n_seqs)*state_size*ggml_element_size(s))));
641-
642-
return output_states;
643-
}
644-
```
554+
_wip_

0 commit comments

Comments
 (0)