Skip to content

feat(demo): add compute mesh observability demo with benchmarking and execution tracing#1248

Open
Avi-47 wants to merge 2 commits intomofa-org:mainfrom
Avi-47:demo/compute-mesh-observability
Open

feat(demo): add compute mesh observability demo with benchmarking and execution tracing#1248
Avi-47 wants to merge 2 commits intomofa-org:mainfrom
Avi-47:demo/compute-mesh-observability

Conversation

@Avi-47
Copy link
Copy Markdown
Contributor

@Avi-47 Avi-47 commented Mar 15, 2026

Summary

This PR is the third phase of #952 which introduces an end-to-end demo for the MoFA Compute Mesh that showcases:
• routing behavior across backends
• latency benchmarking
• execution trace visualization
• architecture documentation

It consolidates functionality previously implemented across three demo PRs:

The result is a single runnable example demonstrating how requests flow through the compute mesh pipeline while exposing performance metrics and execution traces.

No core framework logic is modified.
All changes are confined to the example/demo layer.

Motivation

While the compute mesh infrastructure exists in MoFA, contributors and new developers currently lack a simple way to see how the system behaves end-to-end.

Specifically it is difficult to observe:

• how routing policies select backends
• how inference requests move through the pipeline
• how token streaming behaves
• how latency differs across routing strategies

This demo addresses those gaps by providing a runnable example that makes the entire pipeline visible.

The example demonstrates how workflow execution, routing, backend selection, streaming, and metrics collection interact in a single execution flow.
This demo provides a reference implementation for the Compute Mesh architecture and helps contributors understand how routing, inference, and observability work together in practice.


Features Implemented

1. Latency Benchmarking

The demo collects real-time metrics during inference execution.

The following metrics are reported:

latency_ms
total time from request start to completion

time_to_first_token_ms
time until the first token appears

tokens_streamed
number of tokens produced

tokens_per_second
token generation throughput

total_time_ms
total duration of token streaming

These metrics make it easy to compare routing strategies such as:

LocalFirstWithCloudFallback
LocalOnly
CloudOnly


2. Execution Trace Visualization

The demo adds execution tracing so developers can observe how requests move through the compute mesh pipeline.

Trace events include:

workflow.start
router.policy
router.backend_selection
inference.start
streaming.tokens
metrics.latency_ms
workflow.complete

The trace output makes the internal execution flow visible and can optionally be exported as JSON for external observability tools.


3. Architecture Documentation

The demo now includes detailed documentation explaining the compute mesh architecture and execution lifecycle.

The documentation provides:

• a visual pipeline overview
• explanation of routing policies
• execution lifecycle stages
• example trace output
• walkthrough of how requests travel through the system

This makes the compute mesh easier to understand for new contributors.

Architecture Overview

image

Demo Walkthrough

Running the Demo

cargo run -p local_compute_mesh_demo --manifest-path examples/Cargo.toml -- "Explain photosynthesis"

Example Output

[workflow] executing step: generate_response
[router] policy: LocalFirstWithCloudFallback
[router] selected backend: local
[stream] This
[stream] is
...
[metrics] latency_ms = 365
[metrics] time_to_first_token_ms = 0
[metrics] tokens_streamed = 10
[metrics] tokens_per_second = 27.4
[metrics] total_time_ms = 365

==== Compute Mesh Execution Trace ====

[trace] workflow.start
[trace] router.policy = LocalFirstWithCloudFallback
[trace] router.backend_selection = local
[trace] inference.start
[trace] streaming.tokens = token_1
...
[trace] metrics.latency_ms = 365
[trace] workflow.complete

Testing Instructions

  1. Build the demo:

    cargo build -p local_compute_mesh_demo --manifest-path examples/Cargo.toml
  2. Run the demo:

    cargo run -p local_compute_mesh_demo --manifest-path examples/Cargo.toml -- "Explain photosynthesis"
  3. Verify metrics output shows:

    • latency_ms
    • time_to_first_token_ms
    • tokens_streamed
    • tokens_per_second
    • total_time_ms
  4. Verify trace output shows:

    • workflow.start
    • router.policy
    • router.backend_selection
    • inference.start
    • streaming.tokens
    • metrics.latency_ms
    • workflow.complete

Example Output

Performance Metrics

backend: local
latency_ms: 365
time_to_first_token_ms: 0
tokens_streamed: 10
tokens_per_second: 27.4
total_time_ms: 365

Execution Trace (JSON)

{
  "request_id": "uuid-here",
  "stages": [
    {"stage": "workflow.start", "timestamp_ms": 1700000000000},
    {"stage": "router.policy", "detail": "LocalFirstWithCloudFallback", "timestamp_ms": 1700000000005},
    {"stage": "router.backend_selection", "detail": "local", "timestamp_ms": 1700000000010},
    {"stage": "inference.start", "timestamp_ms": 1700000000015},
    {"stage": "streaming.tokens", "detail": "token_1", "timestamp_ms": 1700000000020},
    {"stage": "metrics.latency_ms", "detail": "365", "timestamp_ms": 1700000000365},
    {"stage": "workflow.complete", "timestamp_ms": 1700000000370}
  ]
}

Screenshots

image image image image image

Breaking Changes

None. This is a new demo package that doesn't affect existing functionality.

Checklist

  • Demo builds successfully
  • Demo runs with example prompt
  • Latency benchmarking implemented with all required metrics
  • Execution trace visualization implemented with all required events
  • Architecture documentation with diagrams
  • No core framework changes (only demo files)

Files Changed

examples/Cargo.toml                          # Added demo to workspace
examples/local_compute_mesh_demo/Cargo.toml  # New demo package
examples/local_compute_mesh_demo/README.md   # Architecture documentation
examples/local_compute_mesh_demo/src/main.rs # Demo implementation
examples/local_compute_mesh_demo/workflow.yaml # Demo workflow config

@Avi-47
Copy link
Copy Markdown
Contributor Author

Avi-47 commented Mar 15, 2026

Hi @lijingrs and @BH3GEI,
Just a quick ping when you have time.
This PR adds observability to the compute mesh demo, including:

@PrinceGautam2106
Copy link
Copy Markdown

/assign

@Avi-47 Avi-47 force-pushed the demo/compute-mesh-observability branch from 72756a8 to d7a5ec6 Compare March 22, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants