Skip to content

⚡ Bolt: make router health checks run concurrently with asyncio.gather#7474

Open
google-labs-jules[bot] wants to merge 1 commit intodevelopfrom
bolt-gather-health-checks-6255120788078218780
Open

⚡ Bolt: make router health checks run concurrently with asyncio.gather#7474
google-labs-jules[bot] wants to merge 1 commit intodevelopfrom
bolt-gather-health-checks-6255120788078218780

Conversation

@google-labs-jules
Copy link
Copy Markdown
Contributor

⚡ Bolt: make router health checks run concurrently with asyncio.gather

💡 What

Modified monitor_instance_health in fastdeploy/router/router.py to use asyncio.gather(*coros, return_exceptions=True) to execute health check requests concurrently instead of sequential awaits.

🎯 Why

Although the code previously had a comment saying # gather all tasks concurrently, the tasks were being awaited sequentially in a for loop (for inst, coro in all_tasks: resp = await coro). This resulted in the total health check time being the sum of the response times from all servers, creating an unnecessary performance bottleneck that scales linearly with O(n) duration for n servers. Using asyncio.gather makes the checks truly concurrent, achieving an O(1) duration equivalent to the maximum individual response time.

📊 Impact

Reduces router health check polling latency substantially when monitoring multiple servers. The duration will now be max(response_time) rather than sum(response_times).

🔬 Measurement

Run pytest tests/router/ to verify tests pass and check router logging to see health checks completing significantly faster when there are multiple servers registered.


PR created automatically by Jules for task 6255120788078218780 started by @ZeyuChen

@google-labs-jules
Copy link
Copy Markdown
Contributor Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 17, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 17, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-17 23:39 CST

📋 Review 摘要

PR 概述:将 router 健康检查从顺序 await 改为 asyncio.gather 并发执行,将健康检查延迟从 O(n) 降低到 O(1)。
变更范围fastdeploy/router/router.py.jules/bolt.md
影响面 TagOptimization

📝 PR 规范检查

PR 标题缺少必要的官方 Tag 前缀。

标题建议(可直接复制):

  • [Optimization] make router health checks run concurrently with asyncio.gather

问题

级别 文件 概述
🟡 建议 .jules/bolt.md Jules bot 内部文档不应提交到仓库

总体评价

核心代码变更逻辑正确,asyncio.gather(*coros, return_exceptions=True) 是标准的并发化改造模式,异常处理和实例移除逻辑与原代码行为一致。建议移除 .jules/bolt.md 文件并修正 PR 标题格式。

Comment thread .jules/bolt.md
@@ -0,0 +1,3 @@
## 2024-05-18 - [Fix Sequential I/O Wait in `monitor_instance_health`]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 此文件为 Jules bot 的内部学习记录,不应提交到仓库中。

建议从本次 PR 中移除 .jules/bolt.md 文件,或将 .jules/ 目录加入 .gitignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants