uv venv # creation
source .venv/bin/activate # activation
uv pip install -r requirements.txt # installation of dependencies (-r : read requirements from file)Definition: feedforward (information flows from the input layer to the output layer only) neural network model with at least 1-2 hidden layers.
MLP stacks several perceptrons organized in layers, each of which:
- takes the output of the previous layer
- transforms the space
- simplifies the problem
Universal approximation theorem: a combination of simple functions can approximate any complex function.
graph LR
A[π Raw Data] --> B[π Normalization]
B --> C[βοΈ Split train/validation/test]
C --> D[βοΈ Batching]
D --> E[π― Forward pass = prediction]
E --> F[π Loss function]
F --> G{Threshold or Early stopping ?}
G --> |Yes| H[β
Best Model]
G --> |No| I[π Backpropagation = gradient]
I --> J[π Gradient descent = MAJ gradients]
Each layer applies a weighted sum + activation :
1. Forward Pass
For each batch, for each layer
Where:
-
$l$ = layer -
$W^{(l)}$ = weight matrix -
$a^{(0)}$ =$x$ (input) -
$a^{(l-1)}$ = output of previous layer -
$b^{(l)}$ = bias -
$f$ = activation function (ReLU, sigmoidβ¦)
Each layer applies a weighted sum + activation
Activation Functions:
| Components | Sigmoid | Softmax | Linear (ReLU) |
|---|---|---|---|
| Output range | (0, 1) | (0, 1), sum to 1 | [0, +β) |
| Use case | Binary / multi-label independent | Multi-class (mutually exclusive) | Hidden layers |
| Output structure | Single probability per neuron | Probability distribution | Non-linear activation |
| Advantages | Interpretable | Probabilistic | Simple, fast |
| Disadvantages | Saturation | Coupled to CE | Dead neurons |
| Formula |
Usage Rules:
- Hidden layers: ReLU (standard), sometimes Sigmoid/Tanh
- Output layer:
- Regression: Linear (no activation)
- Binary classification: Sigmoid
- Multi-class classification: Softmax
2. Loss Function
Measures the error:
| Problem type | Loss function | Formula |
|---|---|---|
| Binary classification | Binary Cross-Entropy | |
| Multi-class classification | Categorical Cross-Entropy | |
| Regression | Mean Squared Error |
3. Backpropagation
Computes how each weight contributed to the error, by propagating gradients from output to input.
One layer:
Layer error:
Gradient Computation:
| Element | Gradient |
|---|---|
| Output layer |
|
| Hidden layer |
4. Gradient Descent
| Element | Formula |
|---|---|
| Weight gradient | |
| Bias gradient | |
| Update |
New weights = old weights - (learning rate Γ weight gradient)
5. Stopping Criteria
- Maximum epochs reached
- Loss < minimal threshold (e.g., loss < 0.001)
- Early stopping (validation loss plateaus)
- Gradient β 0 (convergence reached)
multilayer-perceptron/
βββ config/
β βββ network_config.txt # Exemple de config
βββ datasets/ # Created with the splitting flag
β βββ test_set.csv
β βββ train_set.csv
β βββ valid_set.csv
βββ src/
β βββ activations.py
β βββ config.py
β βββ losses.py
β βββ my_mlp.py # Class MLP
β βββ parsing.py
β βββ preprocessing.py
β βββ split_data.py
β βββ utils.py
βββ test/ # Script bash testeur
| βββ cli_parsing.sh
| βββ config_parsing.sh
| βββ file_management.sh
| βββ mlp_training.sh
βββ mlp.py # Main entry point (lightweight)
βββ saved_model.npy # Model saved by the training flag
usage: mlp.py [-h] --dataset DATASET [--split SPLIT] [--predict PREDICT] [--config CONFIG]
[--layer LAYER [LAYER ...]] [--epochs EPOCHS] [--learning_rate LEARNING_RATE]
[--batch_size BATCH_SIZE] [--loss {binaryCrossentropy,categoricalCrossentropy}]
[--activation_hidden {sigmoid,relu}]
[--weights_init {heUniform,heNormal,xavierUniform,xavierNormal,random}]
Multilayer Perceptron for binary classification
options:
-h, --help show this help message and exit
--dataset DATASET Path to dataset CSV file
--split SPLIT Split ratio (format: train,valid). Ex: 0.7,0.15
--predict PREDICT Path to saved model for prediction
--config CONFIG Path to config file (.txt)
--layer LAYER [LAYER ...]
Hidden layer sizes β β*. Ex: --layer 24 24 24
--epochs EPOCHS Number of training epochs β β*
--learning_rate LEARNING_RATE
Learning rate β [0, 1]
--batch_size BATCH_SIZE
Batch size β β*
--loss {binaryCrossentropy,categoricalCrossentropy}
Loss function
--activation_hidden {sigmoid,relu}
Activation function for hidden layers
--weights_init {heUniform,heNormal,xavierUniform,xavierNormal,random}
Weights initialization methodRun :
python mlp.py --dataset [data_file.csv] --split 0.[x],0.[y]
Implementation :
The program creates datasets folder which contains the 3 following files :
train_set.csv: given to the training phase, the model will learn from itvalid_set.csv: loaded by the training program, the model will be validated based on ittest_set.csv: given to the prediction phase, the model will return predicted and true values
Run :
python mlp.py --dataset datasets/train_set.csv --layer [x] [y] [optionnal: z]
Implementation :
The program trains the neural network on the training set and validates it on the validation set :
- Normalizes : the input data (mean = 0, std = 1) to improve convergence
- Builds the network : with specified architecture and initializes weights
- Iterates : through epochs, shuffling data and processing mini-batches
- Performs : forward pass (prediction) β computes loss β backward pass (gradients) β updates weights
- Monitors : validation loss for early stopping (patience = 5 epochs without improvement)
- Displays : training/validation loss and accuracy curves at the end
- Saves : the best model to
saved_model.npy
Run :
python mlp.py --dataset datasets/test_set.csv --predict saved_model.npy
Implementation :
The program loads a trained model and evaluates it on the test set :
- Loads : the saved model (weights, biases, config, normalization parameters)
- Normalizes : the test data using training statistics
- Performs : forward pass to generate predictions
- Displays : for each sample : true label, predicted label, and raw probabilities
- Computes : accuracy (correctly predicted / total samples)
- Calculates : loss (BCE or CCE depending on the loss function used during training)