HabanaAI · yangulei · Nov 19, 2025 · Nov 19, 2025 · Nov 19, 2025
@@ -467,9 +467,10 @@ modID   CPU Affinity    NUMA Affinity
 ### Profile the LLM engine
 The following 4 ENVs are used to control the device profiling:
 * `VLLM_ENGINE_PROFILER_ENABLED`, set to `true` to enable device profiler.
-* `VLLM_ENGINE_PROFILER_WARMUP_STEPS`, number of steps to ignore for profiling.
-* `VLLM_ENGINE_PROFILER_STEPS`, number of steps to capture for profiling.
-* `VLLM_ENGINE_PROFILER_REPEAT`, number of cycles for (warmup + profile).
+* `VLLM_ENGINE_PROFILER_SKIP_STEPS`, number of steps to skip for profiling.
+* `VLLM_ENGINE_PROFILER_WARMUP_STEPS`, number of steps to ignore for each repetition.
+* `VLLM_ENGINE_PROFILER_STEPS`, number of steps to capture profile in each repetition.
+* `VLLM_ENGINE_PROFILER_REPEAT`, number of repetition for (warmup_steps + profile_steps).
 
 > Please refer to [torch.profiler.schedule](https://pytorch.org/docs/stable/profiler.html#torch.profiler.schedule) for more details about the profiler schedule arguments.
 

@@ -264,10 +264,12 @@ def _setup_profiler(self):
                                    "false").lower() in ["true", "1"]
         if not enable_profile:
             return None
+        skip_first = int(os.getenv("VLLM_ENGINE_PROFILER_SKIP_STEPS", "0"))
         warmup = int(os.getenv("VLLM_ENGINE_PROFILER_WARMUP_STEPS", "0"))
         steps = int(os.getenv("VLLM_ENGINE_PROFILER_STEPS", "1"))
         repeat = int(os.getenv("VLLM_ENGINE_PROFILER_REPEAT", "1"))
-        schedule = torch.profiler.schedule(wait=0,
+        schedule = torch.profiler.schedule(skip_first=skip_first,
+                                           wait=0,
                                            warmup=warmup,
                                            active=steps,
                                            repeat=repeat)