@@ -551,94 +551,4 @@ This is how local temporal context is added before the SSM computation.
551551This operation is performed for each of the 3328 dimensions (rows) in parallel.
552552
553553
554-
555-
556-
557-
558-
559-
560-
561-
562-
563-
564- The convolution mixes the current ` xBC[t] ` with the previous ` xBC[t-1] ` ,
565- ` xBC[t-2] ` , ` xBC[t-3] ` :
566- ```
567- xBC_convolved[t] = w0 * xBC[t] + w1 * xBC[t-1] + w2 * xBC[t-2] + w3 * xBC[t-3]
568- ```
569- Where ` w0, w1, w2, w3 ` are learned convolution kernel weights.
570-
571- ```
572- Token[t] (1536 dims)
573- ↓
574- Linear projection (learned)
575- ↓
576- xBC[t] (3328 dims) ← This is what gets stored in conv_states
577- ↓
578- Concatenate: [xBC[t-3], xBC[t-2], xBC[t-1], xBC[t]]
579- ↓
580- 1D Convolution (learned kernel, size 4)
581- ↓
582- xBC_convolved[t] (3328 dims) ← Local context added!
583- ↓
584- Split → x[t], B[t], C[t]
585- ↓
586- SSM: h[t] = A * h[t-1] + B[t] * x[t]
587- ```
588-
589-
590- ``` c++
591- ggml_tensor * conv = build_rs(inp, conv_states_all, hparams.n_embd_r(), n_seqs);
592- ```
593- The ** 3328-dimensional xBC vector** is a ** projected representation** of the
594- token that will be used for the SSM computation. It's NOT the raw embedding!
595-
596- This projected vector contains:
597- - Information about the input transformed into a higher-dimensional space
598- - It's ** learned** - the projection weights are trained
599- - It's designed to be optimal for the subsequent convolution and SSM operations
600-
601-
602-
603-
604-
605- The first tensor is used with build recurrrent state:
606- ``` c++
607- ggml_tensor * conv = build_rs(inp, conv_states_all, hparams.n_embd_r(), n_seqs);
608- ```
609- ``` c++
610- ggml_tensor * llm_graph_context::build_rs (
611- ggml_tensor * s,
612- ggml_tensor * state_copy_main,
613- ggml_tensor * state_copy_extra,
614- int32_t state_size,
615- int32_t n_seqs,
616- uint32_t n_rs,
617- uint32_t rs_head,
618- uint32_t rs_size,
619- int32_t rs_zero,
620- const llm_graph_get_rows_fn & get_state_rows) const {
621-
622- ggml_tensor * states = ggml_reshape_2d(ctx0, s, state_size, rs_size);
623-
624- // Clear a single state which will then be copied to the other cleared states.
625- // Note that this is a no-op when the view is zero-sized.
626- ggml_tensor * state_zero = ggml_view_1d(ctx0, states, state_size*(rs_zero >= 0), rs_zero*states->nb[1]*(rs_zero >= 0));
627- ggml_build_forward_expand(gf, ggml_scale_inplace(ctx0, state_zero, 0));
628-
629- // copy states
630- // NOTE: assuming the copy destinations are ALL contained between rs_head and rs_head + n_rs
631- // {state_size, rs_size} -> {state_size, n_seqs}
632- ggml_tensor * output_states = get_state_rows(ctx0, states, state_copy_main);
633- ggml_build_forward_expand(gf, output_states);
634-
635- // copy extra states which won't be changed further (between n_seqs and n_rs)
636- ggml_tensor * states_extra = ggml_get_rows(ctx0, states, state_copy_extra);
637- ggml_build_forward_expand(gf,
638- ggml_cpy(ctx0,
639- states_extra,
640- ggml_view_1d(ctx0, s, state_size*(n_rs - n_seqs), (rs_head + n_seqs)*state_size*ggml_element_size(s))));
641-
642- return output_states;
643- }
644- ```
554+ _ wip_
0 commit comments