TinyTorch is a lightweight deep learning training framework implemented from scratch in C++. The project's goal is to closely mimic the API design of PyTorch, serving as an educational tool for developers to understand the core mechanics of a deep learning framework, including automatic differentiation, modular network structures, loss functions, and optimizers.
This project provides a complete yet simple implementation that allows developers to directly see the inner workings of each component, making it an ideal resource for learning.
For more details, please refer to my blog post: Write a nn training framework from scratch
- PyTorch-Style API: Adopts similar class and function naming conventions as PyTorch, such as
Tensor
,Functions
,nn.Module
, andOptimizer
, to make it intuitive for users familiar with the original framework. - Pure C++ Implementation: The entire framework is written in C++ with no dependency on external deep learning libraries, making it perfect for understanding the fundamentals.
- CPU and CUDA Support: The framework is built to run on both CPU and CUDA enabled GPUs, allowing for flexible development and experimentation.
relu
,gelu
,silu
softmax
,logSoftmax
add
,sub
,mul
,div
,matmul
sin
,cos
,sqrt
,pow
maximum
,minimum
lt
,le
,gt
,ge
,eq
,ne
logicNot
,logicAnd
,logicOr
min
,argmin
,max
,argmax
sum
,mean
,var
reshape
,view
,permute
,transpose
flatten
,unflatten
,squeeze
,unsqueeze
split
,concat
,stack
,hstack
,vstack
,narrow
topk
,sort
,cumsum
gather
,scatter
linear
dropout
maxPool2d
conv2d
embedding
layerNorm
rmsNorm
sdpAttention
mseLoss
nllLoss
SGD
,Adagrad
,RMSprop
,AdaDelta
,Adam
,AdamW
Dataset
,DataLoader
,data.Transform
TinyTorch's automatic differentiation (AD) is implemented by building a computation graph. Each operation on a Tensor
is represented by a Function
object, which is responsible for both the forward and backward passes. The Function
nodes are connected via a nextFunctions
field, creating the dependency graph. During the backward()
call, the framework traverses this graph in reverse order, computing and propagating gradients using the chain rule.
- CMake
- C++17 or a more recent compiler
- CUDA Toolkit 11.0+ (optional)
mkdir build
cmake -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release
cd demo/bin
./TinyTorch_demo
cd build
ctest
This code is licensed under the MIT License (see LICENSE).