Skip to content

Commit ad8dfdb

Browse files
committed
docs: update mamba architecture notes
1 parent d2555c8 commit ad8dfdb

File tree

1 file changed

+24
-9
lines changed

1 file changed

+24
-9
lines changed

notes/architectures/mamba.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -367,19 +367,34 @@ Lets say we have input tokens embeddings for the sequence "Dan loves ice cream":
367367
Imagine this as a smooth graph which would be the continuous space. But we only
368368
have access to discrete points (the tokens). We can use Zero Order Hold (ZOH)
369369
for this. Order here means "that there is now change in the value."
370+
The increment where we take samples is called the time step (delta, Δ) and
371+
is something the model learns but can be influenced by the token I think.
370372

373+
So we have the tokens embeddings as points which is the input we have in to the
374+
mamba layer. This is in discrete points/values, but mambas state is a continuous
375+
system, similar to a system that needs a continuous signal, the mamba system
376+
operates on an analog signal. So it needs to be converted to such a signal.
377+
But in practice we don't transform the input tokens into a continuous signal but
378+
instead we transform the parameters A and B of the state space model into discrete
379+
values and perform the selective scan using them. This is called discretization.
371380

372-
So we will first discretize the parameters A, and B of the state space model,
373-
which means that we will convert them from continuous values to discrete values.
381+
To clarify this a bit more:
382+
* We have discrete token embeddings (already discrete).
383+
* We have a continuous SSM formulation: dh/dt = Ah(t) + Bx(t)
384+
* We need a way to apply this continuous SSM to discrete token embeddings.
374385

375-
I think there are multiple methods/ways to do this but the paper mentions
376-
the zero-order hold transform method which is a method for converting a
377-
descrite time signal to continous time signal (the inner space).
386+
So we discretize A and B.
387+
388+
389+
So we will first discretize the parameters A:
390+
```console
391+
A_d = exp(A * Δt)
392+
393+
394+
395+
, and B of the state space model,
396+
which means that we will convert them from continuous values to discrete values.
378397

379-
So we have the tokens embeddings as points which is the input we have in to the
380-
mamba2 layer. This is in discrete points/values, but mambas state is a
381-
continuous, similar to a system that needs a continuous signal, the mamba system
382-
operates on an analog signal. So it needs to be converted to such a signal.
383398

384399
So instead of the using functions as shown above we concrete values we will
385400
transform A and B into discrete values and the equations become:

0 commit comments

Comments
 (0)