feat(demo): add compute mesh observability demo with benchmarking and execution tracing#1248
Open
Avi-47 wants to merge 2 commits intomofa-org:mainfrom
Open
feat(demo): add compute mesh observability demo with benchmarking and execution tracing#1248Avi-47 wants to merge 2 commits intomofa-org:mainfrom
Avi-47 wants to merge 2 commits intomofa-org:mainfrom
Conversation
This was referenced Mar 15, 2026
Contributor
Author
|
Hi @lijingrs and @BH3GEI,
|
|
/assign |
72756a8 to
d7a5ec6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR is the third phase of #952 which introduces an end-to-end demo for the MoFA Compute Mesh that showcases:
• routing behavior across backends
• latency benchmarking
• execution trace visualization
• architecture documentation
It consolidates functionality previously implemented across three demo PRs:
The result is a single runnable example demonstrating how requests flow through the compute mesh pipeline while exposing performance metrics and execution traces.
No core framework logic is modified.
All changes are confined to the example/demo layer.
Motivation
While the compute mesh infrastructure exists in MoFA, contributors and new developers currently lack a simple way to see how the system behaves end-to-end.
Specifically it is difficult to observe:
• how routing policies select backends
• how inference requests move through the pipeline
• how token streaming behaves
• how latency differs across routing strategies
This demo addresses those gaps by providing a runnable example that makes the entire pipeline visible.
The example demonstrates how workflow execution, routing, backend selection, streaming, and metrics collection interact in a single execution flow.
This demo provides a reference implementation for the Compute Mesh architecture and helps contributors understand how routing, inference, and observability work together in practice.
Features Implemented
1. Latency Benchmarking
The demo collects real-time metrics during inference execution.
The following metrics are reported:
latency_ms
total time from request start to completion
time_to_first_token_ms
time until the first token appears
tokens_streamed
number of tokens produced
tokens_per_second
token generation throughput
total_time_ms
total duration of token streaming
These metrics make it easy to compare routing strategies such as:
LocalFirstWithCloudFallback
LocalOnly
CloudOnly
2. Execution Trace Visualization
The demo adds execution tracing so developers can observe how requests move through the compute mesh pipeline.
Trace events include:
workflow.start
router.policy
router.backend_selection
inference.start
streaming.tokens
metrics.latency_ms
workflow.complete
The trace output makes the internal execution flow visible and can optionally be exported as JSON for external observability tools.
3. Architecture Documentation
The demo now includes detailed documentation explaining the compute mesh architecture and execution lifecycle.
The documentation provides:
• a visual pipeline overview
• explanation of routing policies
• execution lifecycle stages
• example trace output
• walkthrough of how requests travel through the system
This makes the compute mesh easier to understand for new contributors.
Architecture Overview
Demo Walkthrough
Running the Demo
cargo run -p local_compute_mesh_demo --manifest-path examples/Cargo.toml -- "Explain photosynthesis"Example Output
Testing Instructions
Build the demo:
Run the demo:
cargo run -p local_compute_mesh_demo --manifest-path examples/Cargo.toml -- "Explain photosynthesis"Verify metrics output shows:
latency_mstime_to_first_token_mstokens_streamedtokens_per_secondtotal_time_msVerify trace output shows:
workflow.startrouter.policyrouter.backend_selectioninference.startstreaming.tokensmetrics.latency_msworkflow.completeExample Output
Performance Metrics
Execution Trace (JSON)
{ "request_id": "uuid-here", "stages": [ {"stage": "workflow.start", "timestamp_ms": 1700000000000}, {"stage": "router.policy", "detail": "LocalFirstWithCloudFallback", "timestamp_ms": 1700000000005}, {"stage": "router.backend_selection", "detail": "local", "timestamp_ms": 1700000000010}, {"stage": "inference.start", "timestamp_ms": 1700000000015}, {"stage": "streaming.tokens", "detail": "token_1", "timestamp_ms": 1700000000020}, {"stage": "metrics.latency_ms", "detail": "365", "timestamp_ms": 1700000000365}, {"stage": "workflow.complete", "timestamp_ms": 1700000000370} ] }Screenshots
Breaking Changes
None. This is a new demo package that doesn't affect existing functionality.
Checklist
Files Changed