Skip to content

demo(compute-mesh): add latency benchmark to local_compute_mesh_demo#1073

Closed
Avi-47 wants to merge 1 commit intomofa-org:mainfrom
Avi-47:demo/compute-mesh-benchmark
Closed

demo(compute-mesh): add latency benchmark to local_compute_mesh_demo#1073
Avi-47 wants to merge 1 commit intomofa-org:mainfrom
Avi-47:demo/compute-mesh-benchmark

Conversation

@Avi-47
Copy link
Copy Markdown
Contributor

@Avi-47 Avi-47 commented Mar 9, 2026

📋 Summary

This PR adds a new local_compute_mesh_demo example that demonstrates the MoFA compute mesh routing capabilities with comprehensive performance benchmarking. It shows how inference requests can be routed between local and cloud backends based on configurable policies, and measures latency, throughput, and token metrics.

🔗 Related Issues

Closes #1072


🧠 Context

The MoFA compute mesh enables intelligent routing of inference requests between local and cloud backends. This demo showcases:

  • Multiple routing policies (LocalFirstWithCloudFallback, CloudOnly, LocalOnly)
  • Memory-aware scheduling with automatic model eviction
  • Built-in performance metrics collection for latency and throughput comparison

This addresses the need to benchmark and compare performance between local vs cloud inference scenarios.


🛠️ Changes

  • Extended the existing local_compute_mesh_demo example with benchmarking support in examples/local_compute_mesh_demo/
  • Added PerformanceMetrics struct to track:
    • latency_ms - Total request latency
    • time_to_first_token_ms - Time to first token
    • tokens_streamed - Token count
    • tokens_per_second - Throughput
    • total_time_ms - Streaming duration
  • Added three demo functions demonstrating different routing policies
  • Created workflow.yaml configuration file
  • Updated README.md with Performance Benchmark documentation section
  • Added example to workspace in examples/Cargo.toml

🧪 How you Tested

  1. Build verification:

    cd examples && cargo check -p local_compute_mesh_demo
  2. Format check:

    cargo fmt --all
  3. Run demo:

    cargo run -p local_compute_mesh_demo -- "Explain photosynthesis"
  4. Output verified - demo runs successfully showing:

    • Workflow execution
    • Router policy selection
    • Token streaming
    • Performance metrics in structured format

Logs

========================================
  MoFA Compute Mesh Demo               
  with Performance Benchmarking         
========================================

[workflow] executing step: generate_response
Prompt: Explain photosynthesis

=== Demo 1: LocalFirstWithCloudFallback ===
[inference] sending request to orchestrator...
[router] policy: LocalFirstWithCloudFallback
[router] selected backend: local

[stream] This
[stream] is
[stream] a
[stream] simulated
...

[metrics]
backend: local
latency_ms: 218
time_to_first_token_ms: 0
tokens_streamed: 21
tokens_per_second: 96.4
total_time_ms: 218

=== Demo 2: CloudOnly ===
[router] policy: CloudOnly
[router] selected backend: cloud

[metrics]
backend: cloud
latency_ms: 220
time_to_first_token_ms: 0
tokens_streamed: 21
tokens_per_second: 95.7
total_time_ms: 220

=== Demo 3: LocalOnly ===
[router] policy: LocalOnly
[router] selected backend: local

[metrics]
backend: local
latency_ms: 221
time_to_first_token_ms: 0
tokens_streamed: 21
tokens_per_second: 94.9
total_time_ms: 221

Screenshots

image image image

⚠️ Breaking Changes

  • No breaking changes

🧹 Checklist

Code Quality

  • Code follows Rust idioms and project conventions
  • cargo fmt run
  • cargo clippy passes without warnings

Testing

  • Demo runs successfully locally

Documentation

  • README updated with Performance Benchmark section

PR Hygiene

  • PR is small and focused (one logical change)
  • Branch is up to date with main
  • No unrelated commits
  • Commit message explains why (benchmarking for compute mesh)

🚀 Deployment Notes

No deployment needed - this is a new example/demo.


🧩 Additional Notes for Reviewers

  • The demo uses simulated streaming (with 10ms delays per token) to demonstrate the metrics collection
  • In production, this would be connected to actual local/Cloud LLM backends
  • The example uses std::time::Instant for high-resolution timing
  • Metrics are printed in a structured [metrics] format for easy parsing

This helps demonstrate the practical benefits of the Compute Mesh architecture by making routing and performance characteristics observable in a runnable example.

@Avi-47 Avi-47 marked this pull request as ready for review March 9, 2026 04:43
@Avi-47
Copy link
Copy Markdown
Contributor Author

Avi-47 commented Mar 10, 2026

Hi @lijingrs and @BH3GEI,
Just a gentle ping on this PR when convenient.
It only extends the local_compute_mesh_demo example with simple
latency benchmarking and routing comparison. No core framework logic
was changed and CI is passing.
Happy to make any adjustments if needed.
Thanks!

@Avi-47
Copy link
Copy Markdown
Contributor Author

Avi-47 commented Mar 15, 2026

This change has been consolidated into a new PR #1248 that groups the compute mesh demo improvements into a single runnable example.
This PR includes latency benchmarking, execution trace visualization, architecture documentation

To keep the review simpler and avoid splitting related demo features across multiple PRs, I’m closing this one in favor of the consolidated PR.
The functionality from this PR is fully preserved there.
Thank you!

@Avi-47 Avi-47 closed this Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

demo(compute-mesh): add performance benchmark and latency comparison for local vs cloud inference

1 participant