Gurobi · pobonomo · Jun 17, 2024 · Oct 5, 2025 · Oct 5, 2025 · Oct 5, 2025
diff --git a/docs/source/user/mip-models.rst b/docs/source/user/mip-models.rst
@@ -57,29 +57,45 @@ function. By default, the approximation guarantees a maximal error of
 keyword argument when the constraints is added.
 
 
-Neural Networks
-===============
+Sequential Neural Networks
+==========================
 
-The package currently models dense neural network with ReLU activations. For a
-given neuron the relation between its inputs and outputs is given by:
+The package supports sequential neural networks. Layers are added as building
+blocks; the package creates the necessary variables and constraints and wires
+them to match the network structure.
+
+Dense layers (details)
+----------------------
+
+For dense layers with ReLU activations, each neuron applies an affine
+transformation followed by a ReLU. For a neuron with weights
+\(\beta \in \mathbb{R}^{p+1}\), inputs \(x\), and output \(y\):
 
 .. math::
 
-    y = \max(\sum_{i=1}^p \beta_i x_i + \beta_0, 0).
+    y = \max\Big(\sum_{i=1}^p \beta_i x_i + \beta_0,\; 0\Big).
 
-The relationship is formulated in the optimization model by using Gurobi
-:math:`max` `general constraint
-<https://www.gurobi.com/documentation/latest/refman/constraints.html#subsubsection:GeneralConstraints>`_
-with:
+This is modeled using Gurobi general constraints by introducing an auxiliary
+variable \(\omega\) for the affine part and then enforcing the ReLU:
 
 .. math::
 
-    & \omega = \sum_{i=1}^p \beta_i x_i + \beta_0
+    &\omega = \sum_{i=1}^p \beta_i x_i + \beta_0,\\
+    &y = \max(\omega, 0).
+
+Other layers (summary)
+----------------------
 
-    & y = \max(\omega, 0)
+- Conv2D and MaxPooling2D: supported with padding equivalent to ``valid`` only
+  (no non‑zero or ``same`` padding). Strides are supported. Internally, tensors
+  use channels‑last layout (NHWC) in the optimization model.
+- Flatten: converts a 4D (NHWC) tensor to 2D (batch, features).
+- Dropout: accepted but ignored at inference time (treated as identity).
 
-with :math:`\omega` an auxiliary free variable. The neurons are then connected
-according to the topology of the network.
+Notes:
+- Keras models use NHWC throughout. PyTorch models are evaluated in NCHW, but
+  the package handles the necessary internal conversions so predicted values
+  match the framework’s behavior.
 
 
 Decision Tree Regression

diff --git a/docs/source/user/start.rst b/docs/source/user/start.rst
@@ -154,6 +154,15 @@ For a simple example on how to use the package please refer to
 in the :doc:`../auto_examples/index` section.
 
 
+.. note::
+
+  Variable shapes: For tabular models (scikit-learn, tree ensembles, dense
+  neural nets), inputs are typically 2D MVars with shape ``(batch, features)``
+  and outputs are 1D or 2D (the package orients a 1D output based on the
+  batch size). For convolutional neural networks (Keras/PyTorch), use 4D MVars
+  with shape ``(batch, H, W, C)`` (channels-last).
+
+
 .. rubric:: Footnotes
 
 .. [#] Classification models are currently not supported (except binary logistic

diff --git a/docs/source/user/supported.rst b/docs/source/user/supported.rst
@@ -99,27 +99,46 @@ Keras
 They can be formulated in a Gurobi model with the function
 :py:func:`add_keras_constr <gurobi_ml.keras.add_keras_constr>`.
 
-Currently, only two types of layers are supported:
-
-    * `Dense layers <https://keras.io/api/layers/core_layers/dense/>`_ (possibly
-      with `relu` activation),
-    * `ReLU layers <https://keras.io/api/layers/activation_layers/relu/>`_ with
-      default settings.
+Supported layers and notes:
+
+    - `Dense <https://keras.io/api/layers/core_layers/dense/>`_ with activation
+      ``relu`` or ``linear``.
+    - `ReLU <https://keras.io/api/layers/activation_layers/relu/>`_ with default
+      settings (no negative_slope/threshold/max_value variations).
+    - `Conv2D <https://keras.io/api/layers/convolution_layers/convolution2d/>`_
+      with activation ``relu`` or ``linear`` and padding ``valid`` only (no
+      ``same`` padding). Strides are supported.
+    - `MaxPooling2D <https://keras.io/api/layers/pooling_layers/max_pooling2d/>`_
+      with padding ``valid`` only.
+    - `Flatten <https://keras.io/api/layers/reshaping_layers/flatten/>`_.
+    - `Dropout <https://keras.io/api/layers/regularization_layers/dropout/>`_ is
+      accepted but ignored at inference time (treated as identity).
+
+Input tensors for CNNs use channels-last layout (NHWC). Flatten converts 4D
+NHWC tensors to 2D (batch, features).
 
 PyTorch
 -------
 
-
-In PyTorch, only :external+torch:py:class:`torch.nn.Sequential` objects are
-supported.
-
-They can be formulated in a Gurobi model with the function
-:py:func:`add_sequential_constr <gurobi_ml.torch.sequential.add_sequential_constr>`.
-
-Currently, only two types of layers are supported:
-
-   * :external+torch:py:class:`Linear layers <torch.nn.Linear>`,
-   * :external+torch:py:class:`ReLU layers <torch.nn.ReLU>`.
+In PyTorch, :external+torch:py:class:`torch.nn.Sequential` models are supported
+via :py:func:`add_sequential_constr <gurobi_ml.torch.sequential.add_sequential_constr>`.
+
+Supported layers and notes:
+
+   - :external+torch:py:class:`Linear <torch.nn.Linear>`.
+   - :external+torch:py:class:`ReLU <torch.nn.ReLU>`.
+   - :external+torch:py:class:`Conv2d <torch.nn.Conv2d>` with padding equivalent
+     to ``valid`` only (no non-zero padding or ``same``), strides supported.
+   - :external+torch:py:class:`MaxPool2d <torch.nn.MaxPool2d>` with padding
+     equivalent to ``valid`` only.
+   - :external+torch:py:class:`Flatten <torch.nn.Flatten>`.
+   - :external+torch:py:class:`Dropout <torch.nn.Dropout>` is accepted and
+     ignored at inference time (identity).
+
+Input tensors for CNNs are provided as NHWC variables. Internally, inputs are
+converted to NCHW for PyTorch evaluation and converted back for error checks.
+The first Linear after a Flatten layer is adjusted to account for PyTorch’s
+NCHW flatten order so that predictions match exactly.
 
 XGBoost
 -------