Merge pull request Mellanox#271 from almaslennikov/launch-kit

almaslennikov · web-flow · commit 0aec51e4a55c · 2025-11-02T20:50:26.000+01:00
feat: add tech preview docs on Launch Kit
diff --git a/docs/common/vars.rst b/docs/common/vars.rst
@@ -45,3 +45,7 @@
 .. |rdma-cni-repository| replace:: nvcr.io/nvstaging/mellanox
 .. |spectrumxop-version| replace:: network-operator-v25.10.0-beta.4
 .. |spectrumxop-repository| replace:: nvcr.io/nvstaging/mellanox
+.. |k8s-launch-kit-version| replace:: v25.10.0
+.. |k8s-launch-kit-repository| replace:: nvcr.io/nvidia/cloud-native
+.. |k8s-launch-kit-network-operator-repository| replace:: nvcr.io/nvidia/cloud-native
+.. |k8s-launch-kit-component-version| replace:: network-operator-v25.10.0
diff --git a/docs/index.rst b/docs/index.rst
@@ -26,6 +26,7 @@
    Getting Started with Kubernetes <getting-started-with-kubernetes.rst>
    Getting Started with Red Hat OpenShift <getting-started-with-openshift.rst>
    NIC Configuration Operator <nic-conf-operator/nic-configuration-operator.rst>
+   [TECH PREVIEW] Configuration Assistance with Kubernetes Launch Kit <k8s-launch-kit.rst>
    Customization Options and CRDs <customizations/customization.rst>
    Life Cycle Management <life-cycle-management.rst>
    Advanced Configurations <advanced/advanced.rst>
diff --git a/docs/k8s-launch-kit.rst b/docs/k8s-launch-kit.rst
@@ -0,0 +1,277 @@
+.. license-header
+  SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+  SPDX-License-Identifier: Apache-2.0
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+
+.. headings # #, * *, =, -, ^, "
+.. include:: ./common/vars.rst
+
+
+******************************************************************
+[TECH PREVIEW] Configuration Assistance with Kubernetes Launch Kit
+******************************************************************
+
+.. contents:: On this page
+   :depth: 3
+   :local:
+   :backlinks: none
+
+Kubernetes Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.
+
+-------------
+Prerequisites
+-------------
+
+For prerequisites, please refer to the :doc:`NVIDIA Network Operator Deployment Guide with Kubernetes <deployment-guide-kubernetes>` page.
+
+You will need a Kubernetes cluster with NVIDIA Network Operator helm chart installed.
+
+----------------
+Operation Phases
+----------------
+
+==============================
+Discover Cluster Configuration
+==============================
+
+Deploy a minimal Network Operator profile to automatically discover your cluster's network capabilities and hardware configuration. This phase can be skipped if you provide your own configuration file.
+
+==============================
+Select the Deployment Profile
+==============================
+
+Specify the desired deployment profile via CLI flags or with the natural language prompt for the LLM.
+
+=========================
+Generate Deployment Files
+=========================
+
+Based on the discovered/provided configuration, generate a complete set of YAML deployment files tailored to your selected network profile.
+
+-------------------
+Supported Use Cases
+-------------------
+
+Kubernetes Launch Kit supports the following use cases:
+
+- SR-IOV Network with RDMA
+- Host Device Network with RDMA
+- IP over InfiniBand with RDMA Shared Device
+- MacVLAN Network with RDMA Shared Device
+- SR-IOV InfiniBand Network with RDMA
+
+Please refer to the :doc:`quick-start/quick-start-k8s` page for more details.
+
+-----
+Usage
+-----
+
+Kubernetes Launch Kit is available as a docker container:
+
+.. code-block:: bash
+    :substitutions:
+
+    mkdir ~/cluster-configuration
+    cp /etc/kubernetes/admin.conf ~/cluster-configuration/kubeconfig
+    docker run -v ~/cluster-configuration:/cluster-configuration --net=host |k8s-launch-kit-repository|/k8s-launch-kit:|k8s-launch-kit-version| --discover-cluster-config --kubeconfig /cluster-configuration/kubeconfig --save-cluster-config /cluster-configuration/config.yaml --log-level debug  --save-deployment-files /cluster-configuration/deployments --fabric infiniband --deployment-type rdma_shared --multirail
+
+Don't forget to enable --net=host and mount the necessary directories for input and output files with -v.
+
+.. code-block:: text
+
+    K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.
+
+    ### Discover Cluster Configuration
+    Deploy a minimal Network Operator profile to automatically discover your cluster's
+    network capabilities and hardware configuration by using --discover-cluster-config.
+    This phase can be skipped if you provide your own configuration file by using --user-config.
+    This phase requires --kubeconfig to be specified.
+
+    ### Generate Deployment Files
+    Based on the discovered or provided configuration, 
+    generate a complete set of YAML deployment files for the selected network profile. 
+    Files can be saved to disk using --save-deployment-files.
+    The profile can be defined manually with --fabric, --deployment-type and --multirail flags,
+    OR generated by an LLM-assisted profile generator with --prompt (requires --llm-api-key and --llm-vendor).
+
+    ### Deploy to Cluster
+    Apply the generated deployment files to your Kubernetes cluster by using --deploy. This phase requires --kubeconfig and can be skipped if --deploy is not specified.
+
+    Usage:
+    l8k [flags]
+    l8k [command]
+
+    Available Commands:
+    completion  Generate the autocompletion script for the specified shell
+    help        Help about any command
+    version     Print the version number
+
+    Flags:
+        --ai                             Enable AI deployment
+        --deploy                         Deploy the generated files to the Kubernetes cluster
+        --deployment-type string         Select the deployment type (sriov, rdma_shared, host_device)
+        --discover-cluster-config        Deploy a thin Network Operator profile to discover cluster capabilities
+        --enabled-plugins string         Comma-separated list of plugins to enable (default "network-operator")
+        --fabric string                  Select the fabric type to deploy (infiniband, ethernet)
+    -h, --help                           help for l8k
+        --kubeconfig string              Path to kubeconfig file for cluster deployment (required when using --deploy)
+        --llm-api-key string             API key for the LLM API (required when using --prompt)
+        --llm-api-url string             API URL for the LLM API (required when using --prompt)
+        --llm-vendor string              Vendor of the LLM API (required when using --prompt) (default "openai-azure")
+        --log-level string               Log level (debug, info, warn, error) (default "info")
+        --multirail                      Enable multirail deployment
+        --prompt string                  Path to file with a prompt to use for LLM-assisted profile generation
+        --save-cluster-config string     Save discovered cluster configuration to the specified path (default "/opt/nvidia/k8s-launch-kit/cluster-config.yaml")
+        --save-deployment-files string   Save generated deployment files to the specified directory (default "/opt/nvidia/k8s-launch-kit/deployment")
+        --spectrum-x                     Enable Spectrum X deployment
+        --user-config string             Use provided cluster configuration file instead of auto-discovery (skips cluster discovery)
+
+    Use "l8k [command] --help" for more information about a command.
+
+--------------
+Usage Examples
+--------------
+
+=================
+Complete Workflow
+=================
+Discover cluster config, generate files, and deploy:
+
+.. code-block:: bash
+
+    l8k --discover-cluster-config --save-cluster-config ./cluster-config.yaml \
+    --fabric ethernet --deployment-type sriov --multirail \
+    --save-deployment-files ./deployments \
+    --deploy --kubeconfig ~/.kube/config
+
+
+================================
+Discover Cluster Configuration
+================================
+
+.. code-block:: bash
+
+    l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml
+
+
+==========================
+Use Existing Configuration
+==========================
+
+Generate and deploy with pre-existing config:
+
+.. code-block:: bash
+
+    l8k --user-config ./existing-config.yaml \
+        --fabric ethernet --deployment-type sriov --multirail \
+        --deploy --kubeconfig ~/.kube/config
+
+=========================
+Generate Deployment Files
+=========================
+
+.. code-block:: bash
+
+    l8k --user-config ./config.yaml \
+        --fabric ethernet --deployment-type sriov --multirail \
+        --save-deployment-files ./deployments
+
+=======================================================
+Generate Deployment Files using Natural Language Prompt
+=======================================================
+
+Kubernetes Launch Kit supports a LLM-assisted profile generation. You can provide a natural language prompt to the tool and it will generate a deployment profile for you.
+To configure the LLM, you need to provide the API key to OpenAI Azure backend. 
+
+.. code-block:: bash
+
+    echo "I want to enable multirail networking in my AI cluster" > requirements.txt
+    l8k --user-config ./config.yaml \
+        --prompt requirements.txt --llm-vendor openai-azure --llm-api-key <OPENAI_AZURE_KEY> \
+        --save-deployment-files ./deployments
+
+--------------------------
+Configuration File Format        
+--------------------------
+
+After the cluster configuration is discovered, the tool will save the configuration to a file.
+You can use this file as a starting point for your own configuration. Own configuration file can be provided to the tool using `--user-config` flag.
+
+.. code-block:: yaml
+  :substitutions:
+
+    networkOperator:
+      version: |k8s-launch-kit-version|
+      componentVersion: |k8s-launch-kit-component-version|
+      repository: |k8s-launch-kit-network-operator-repository|
+      namespace: nvidia-network-operator
+    nvIpam:
+      poolName: nv-ipam-pool
+    subnets:
+    - subnet: 192.168.2.0/24
+      gateway: 192.168.2.1
+    - subnet: 192.168.3.0/24
+      gateway: 192.168.3.1
+    - subnet: 192.168.4.0/24
+      gateway: 192.168.4.1
+    - subnet: 192.168.5.0/24
+      gateway: 192.168.5.1
+    - subnet: 192.168.6.0/24
+      gateway: 192.168.6.1
+    - subnet: 192.168.7.0/24
+      gateway: 192.168.7.1
+    - subnet: 192.168.8.0/24
+      gateway: 192.168.8.1
+    - subnet: 192.168.9.0/24
+      gateway: 192.168.9.1
+    - subnet: 192.168.10.0/24
+      gateway: 192.168.10.1
+    sriov:
+      mtu: 9000
+      numVfs: 8
+      priority: 90
+      resourceName: sriov_resource
+      networkName: sriov_network
+    hostdev:
+      resourceName: hostdev-resource
+      networkName: hostdev-network
+      rdmaShared:
+      resourceName: rdma_shared_resource
+      hcaMax: 63
+    ipoib:
+      networkName: ipoib-network
+    macvlan:
+      networkName: macvlan-network
+    clusterConfig:
+      capabilities:
+        nodes:
+        sriov: true
+        rdma: true
+        ib: true
+      pfs:
+      - rdmaDevice: mlx5_0
+        pciAddress: "0000:03:00.0"
+        networkInterface: enp3s0f0np0
+        traffic: east-west
+      - rdmaDevice: mlx5_1
+        pciAddress: "0000:03:00.1"
+        networkInterface: enp3s0f1np1
+        traffic: east-west
+      - rdmaDevice: mlx5_2
+        pciAddress: 0000:81:00.0
+        networkInterface: enp129s0np0
+        traffic: east-west
+      workerNodes:
+      - cloud-dev-41
+      - cloud-dev-40