Marketplace is a machine learning experiment aimed at training a model efficiently on a GPU without using backpropagation. The approach involves breaking down the layers of a machine learning model into smaller groups, running them with various parameter combinations. We select the best-performing parameter combination and mutate it with different parameter variants. To learn more about the concept, please refer to the articles:
- Marketplace: my first attempt at training without backprop on GPU efficiently
- Marketplace V2 is all you need: A training algorithm on par with backprop that needs only forward pass
- Continual learning with the Marketplace algorithm: model learns new data through inference, not training
For example, the beautiful_mnist model included in Tinygrad's example folder can be broken down into three groups of layers:
from marketplace.training import Spec
from marketplace.nn import Model
from tinygrad import Tensor
from tinygrad.nn import Conv2d
from tinygrad.nn import InstanceNorm
from tinygrad.nn import Linear
[
Spec(
model=Model(
Conv2d(vendor_count, 1, 32, 5),
Tensor.relu,
Conv2d(vendor_count, 32, 32, 5),
Tensor.relu,
InstanceNorm(vendor_count, 32),
Tensor.max_pool2d,
)
),
Spec(
model=Model(
Conv2d(vendor_count, 32, 64, 3),
Tensor.relu,
Conv2d(vendor_count, 64, 64, 3),
Tensor.relu,
InstanceNorm(vendor_count, 64),
Tensor.max_pool2d,
lambda x: x.flatten(1),
),
),
Spec(
model=Model([Linear(vendor_count, 576, 10)]),
),
]
With that, we can run the model on GPU with different combinations of parameters. The following code is a simple example of how to run a forward pass of the model.
from tinygrad import Tensor
from tinygrad import TinyJit
from marketplace.training import forward
from marketplace.optimizers import Optimizer
@TinyJit
def forward_step() -> tuple[Tensor, Tensor, Tensor]:
samples = Tensor.randint(batch_size, high=X_train.shape[0])
x = X_train[samples]
y = Y_train[samples]
batch_logits, batch_paths = forward(marketplace, x)
loss = Tensor.stack(
*(logits.sparse_categorical_crossentropy(y) for logits in batch_logits),
dim=0,
)
best_loss, best_index = loss.topk(1, largest=False)
best_index = best_index.squeeze(0)
accuracy = (
(batch_logits[best_index].sigmoid().argmax(axis=1) == y).sum() / batch_size
) * 100
return (
best_loss.realize(),
accuracy.realize(),
batch_paths[best_index].realize(),
)
lr = Tensor(1e-1).contiguous().realize()
optimizer = Optimizer(
marketplace=marketplace,
learning_rate=lr,
)
best_loss, best_accuracy, best_path = forward_step()
Next, now we know the best parameters combination, we can mutate it with different variants of parameters.
TODO: the following is outdated, needs to update
@TinyJit
def mutate_step(best_path: Tensor):
mutate(
marketplace=marketplace,
leading_path=best_path,
jitter=lr,
)
mutate_step(best_path)
That's it. We just trained a model without using backpropagation and relying on only the forward pass! By reepeating the process, we can train a model. Of course, this is still no match for the backprop training, but it's an interesting start.
All of the experiments are in the experiments
folder.
To run the training, you can use the following command:
CUDA=1 uv run python -m experiments.beautiful_mnist
It comes with some arguments to control the training, you can see them by running:
uv run python -m experiments.beautiful_mnist --help