The NeMo Agent Toolkit uses a flexible, plugin-based observability system that provides comprehensive support for configuring logging, tracing, and metrics for workflows. Users can configure multiple telemetry exporters simultaneously from the available options or create custom integrations. The observability system:
- Uses an event-driven architecture with
IntermediateStepManagerpublishing workflow events to a reactive stream - Supports multiple concurrent telemetry exporters processing events asynchronously
- Provides built-in exporters for popular observability platforms (LangSmith, Phoenix, Langfuse, Weave, etc.)
- Enables custom telemetry exporter development for any observability service
These features enable developers to test their workflows locally and integrate observability seamlessly with their preferred monitoring stack.
The core observability features (console and file logging) are included by default. For advanced telemetry features like OpenTelemetry and Phoenix tracing, you need to install the optional telemetry extras.
If you have already installed the NeMo Agent Toolkit from source, you can install package extras with the following commands, depending on whether you installed the NeMo Agent Toolkit from source or from a package.
::::{tab-set} :sync-group: install-tool
:::{tab-item} source :selected: :sync: source
# Install specific telemetry extras
uv pip install -e ".[data-flywheel]"
uv pip install -e ".[opentelemetry]"
uv pip install -e ".[phoenix]"
uv pip install -e ".[weave]"
# Note: conflicts with .[strands] and .[adk]
uv pip install -e ".[ragaai]":::
:::{tab-item} package :sync: package
# Install specific telemetry extras
uv pip install "nvidia-nat[data-flywheel]"
uv pip install "nvidia-nat[opentelemetry]"
uv pip install "nvidia-nat[phoenix]"
uv pip install "nvidia-nat[weave]"
# Note: conflicts with nvidia-nat[strands] and nvidia-nat[adk]
uv pip install "nvidia-nat[ragaai]":::
::::
The following table lists each exporter with its supported features and configuration guide:
| Provider | Integration Documentation | Supported Features |
|---|---|---|
| Catalyst | Observing with Catalyst{.external} | Logging, Tracing |
| NVIDIA Data Flywheel Blueprint | Observing with Data Flywheel{.external} | Logging, Tracing |
| DBNL | Observing with DBNL{.external} | Logging, Tracing |
| Dynatrace | Observing with Dynatrace{.external} | Logging, Tracing |
| Galileo | Observing with Galileo{.external} | Logging, Tracing |
| Langfuse | Refer to the examples/observability/simple_calculator_observability example for usage details |
Logging, Tracing |
| LangSmith | Observing with LangSmith{.external} | Logging, Tracing, Evaluation Metrics |
| OpenTelemetry Collector | Observing with OTel Collector{.external} | Logging, Tracing |
| Patronus | Refer to the examples/observability/simple_calculator_observability example for usage details |
Logging, Tracing |
| Phoenix | Observing with Phoenix{.external} | Logging, Tracing |
| W&B Weave | Observing with W&B Weave{.external} | Logging, Tracing, W&B Weave Redaction, Evaluation Metrics |
Additional options:
- File Export - Built-in file-based tracing for local development and debugging
- Custom Exporters - Refer to Adding Telemetry Exporters for creating custom integrations
For complete configuration examples and setup instructions, check the examples/observability/ directory.
The flexible observability system is configured using the general.telemetry section in the workflow configuration file. This section contains two subsections: logging and tracing, and each subsection can contain multiple telemetry exporters running simultaneously.
For a complete list of logging and tracing plugins and corresponding configuration settings use the following CLI commands.
# For all registered logging plugins
nat info components -t logging
# For all registered tracing plugins
nat info components -t tracingIllustrated below is a sample configuration file demonstrating multiple exporters configured to run concurrently.
general:
telemetry:
logging:
console:
_type: console
level: WARN
file:
_type: file
path: ./.tmp/workflow.log
level: DEBUG
tracing:
# Multiple exporters can run simultaneously
phoenix:
_type: phoenix
# ... configuration fields
weave:
_type: weave
# ... configuration fields
file_backup:
_type: file
# ... configuration fieldsThe logging section contains one or more logging providers. Each provider has a _type and optional configuration fields. The following logging providers are supported by default:
console: Writes logs to the console.file: Writes logs to a file.
Available log levels:
DEBUG: Detailed information for debugging.INFO: General information about the workflow.WARNING: Potential issues that should be addressed.ERROR: Issues that affect the workflow from running correctly.CRITICAL: Severe issues that prevent the workflow from continuing to run.
If a log level is specified, all logs at or above that level will be logged. For example, if the log level is set to WARNING, all logs at or above that level will be logged. If the log level is set to ERROR, all logs at or above that level will be logged.
The tracing section contains one or more tracing providers. Each provider has a _type and optional configuration fields. The observability system supports multiple concurrent exporters.
The NeMo Agent Toolkit observability system uses a generic, plugin-based architecture built on the Subject-Observer pattern. The system consists of several key components working together to provide comprehensive workflow monitoring:
IntermediateStepManager: Publishes workflow events (IntermediateStepobjects) to a reactive event stream, tracking function execution boundaries, LLM calls, tool usage, and intermediate operations.- Event Stream: A reactive stream that broadcasts
IntermediateStepevents to all subscribed telemetry exporters, enabling real-time observability. - Asynchronous Processing: All telemetry exporters process events asynchronously in background tasks, keeping observability "off the hot path" for optimal performance.
The system supports multiple exporter types, each optimized for different use cases:
- Raw Exporters: Process
IntermediateStepevents directly for simple logging, file output, or custom event processing. - Span Exporters: Convert events into spans with lifecycle management, ideal for distributed tracing and span-based observability services.
- OpenTelemetry Exporters: Specialized exporters for OTLP-compatible services with pre-built integrations for popular observability platforms.
- Advanced Custom Exporters: Support complex business logic, stateful processing, and enterprise reliability patterns with circuit breakers and dead letter queues.
Each exporter can optionally include a processing pipeline that transforms, filters, batches, or aggregates data before export:
- Processors: Modular components for data transformation, filtering, batching, and format conversion.
- Pipeline Composition: Chain multiple processors together for complex data processing workflows.
- Type Safety: Generic type system ensures compile-time safety for data transformations through the pipeline.
- {py:class}
nat.plugins.profiler.decorators: Decorators that wrap workflow and LLM framework context managers to inject usage-collection callbacks. - {py:class}
~nat.plugins.profiler.callbacks: Callback handlers that track usage statistics (tokens, time, inputs/outputs) and push them to the event stream. Supports LangChain/LangGraph, LLama Index, CrewAI, Semantic Kernel, and Google ADK frameworks.
For complete information about developing and integrating custom telemetry exporters, including detailed examples, best practices, and advanced configuration options, Refer to Adding Telemetry Exporters.
::::{tab-set} :sync-group: provider
:::{tab-item} Catalyst :sync: Catalyst
:::{include} ./observe-workflow-with-catalyst.md
:::
:::{tab-item} Data Flywheel :sync: Data-Flywheel
:::{include} ./observe-workflow-with-data-flywheel.md
:::
:::{tab-item} DBNL :sync: DBNL
:::{include} ./observe-workflow-with-dbnl.md
:::
:::{tab-item} Dynatrace :sync: Dynatrace
:::{include} ./observe-workflow-with-dynatrace.md
:::
:::{tab-item} Galileo :sync: Galileo
:::{include} ./observe-workflow-with-galileo.md
:::
:::{tab-item} LangSmith :sync: LangSmith
:::{include} ./observe-workflow-with-langsmith.md
:::
:::{tab-item} OTel Collector :sync: OTel-collector
:::{include} ./observe-workflow-with-otel-collector.md
:::
:::{tab-item} Phoenix :sync: Phoenix
:::{include} ./observe-workflow-with-phoenix.md
:::
:::{tab-item} W&B Weave :sync: Wandb-Weave
:::{include} ./observe-workflow-with-weave.md
:::
::::
When one workflow invokes another (for example, by calling a remote workflow over HTTP or by running a child workflow programmatically), you can link the trace of the child workflow to the parent so that observability backends show a single, connected tree instead of separate traces.
If you run a workflow from code using a session, pass parent_id and parent_name into session.run(). The toolkit uses these to set the root of the intermediate steps of the child workflow so the first step has the correct parent.
async with session_manager.session() as session:
async with session.run(
prompt,
parent_id="parent-step-uuid",
parent_name="Caller Workflow",
) as runner:
result = await runner.result(to_type=str)parent_id: The step ID of the parent (for example, the current workflow step or span that is invoking the child). The root workflow step of the child run is emitted with this as its parent.parent_name: Optional display name for the parent (for example, the workflow or function name). The function ancestry of the root uses this as the parent name for observability.
When a workflow is triggered over HTTP (such as a POST to /generate/full), the server reads request headers to set the parent for that run. If present, they are applied before the workflow starts so the root step has the correct parent.
| Header | Description |
|---|---|
workflow-parent-id |
Step ID of the parent. The root workflow step is emitted with this as its parent. |
workflow-parent-name |
Optional display name for the parent (workflow or function name). |
Example with curl:
curl -X POST http://localhost:8000/generate/full \
-H "workflow-parent-id: <parent-step-id>" \
-H "workflow-parent-name: Parent Workflow Name" \
-H "Content-Type: application/json" \
-d '{"input_message": "..."}'Use these headers when the caller (orchestrator, API gateway, or another workflow) has a step or span ID and wants the child workflow to appear under that step in traces.
When your workflow calls a remote workflow (for example, by calling its /generate/full endpoint) and receives intermediate step data in the response, you can push those steps into the observability stream of the current run. That way, the steps of the remote workflow appear as part of the same trace tree.
Use the {py:meth}~nat.builder.intermediate_step_manager.IntermediateStepManager.push_intermediate_steps method from any code that runs inside the current workflow context. Pass the list of intermediate steps (for example, parsed from the remote response); they are injected into the event stream of the current run. The parent of the replayed root step is determined by how the remote was invoked: set workflow-parent-id and workflow-parent-name headers when calling the remote, or use session.run(parent_id=..., parent_name=...) when running a child workflow programmatically, so the trace tree links correctly.
from nat.builder.context import Context
# After calling a remote workflow (for example, /generate/full) and parsing
# the response into a list of IntermediateStep:
Context.get().intermediate_step_manager.push_intermediate_steps(remote_intermediate_steps)This is useful when you call a remote workflow and want its steps to appear under the trace of the current workflow in your observability backend, so you get one connected tree for the full request.