demo(compute-mesh): add latency benchmark to local_compute_mesh_demo#1073
Closed
Avi-47 wants to merge 1 commit intomofa-org:mainfrom
Closed
demo(compute-mesh): add latency benchmark to local_compute_mesh_demo#1073Avi-47 wants to merge 1 commit intomofa-org:mainfrom
Avi-47 wants to merge 1 commit intomofa-org:mainfrom
Conversation
Contributor
Author
1136329 to
028d11b
Compare
6 tasks
Contributor
Author
|
This change has been consolidated into a new PR #1248 that groups the compute mesh demo improvements into a single runnable example. To keep the review simpler and avoid splitting related demo features across multiple PRs, I’m closing this one in favor of the consolidated PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 Summary
This PR adds a new
local_compute_mesh_demoexample that demonstrates the MoFA compute mesh routing capabilities with comprehensive performance benchmarking. It shows how inference requests can be routed between local and cloud backends based on configurable policies, and measures latency, throughput, and token metrics.🔗 Related Issues
Closes #1072
🧠 Context
The MoFA compute mesh enables intelligent routing of inference requests between local and cloud backends. This demo showcases:
This addresses the need to benchmark and compare performance between local vs cloud inference scenarios.
🛠️ Changes
local_compute_mesh_demoexample with benchmarking support inexamples/local_compute_mesh_demo/PerformanceMetricsstruct to track:latency_ms- Total request latencytime_to_first_token_ms- Time to first tokentokens_streamed- Token counttokens_per_second- Throughputtotal_time_ms- Streaming duration🧪 How you Tested
Build verification:
Format check:
Run demo:
cargo run -p local_compute_mesh_demo -- "Explain photosynthesis"Output verified - demo runs successfully showing:
Logs
Screenshots
🧹 Checklist
Code Quality
cargo fmtruncargo clippypasses without warningsTesting
Documentation
PR Hygiene
main🚀 Deployment Notes
No deployment needed - this is a new example/demo.
🧩 Additional Notes for Reviewers
std::time::Instantfor high-resolution timing[metrics]format for easy parsingThis helps demonstrate the practical benefits of the Compute Mesh architecture by making routing and performance characteristics observable in a runnable example.