-
Notifications
You must be signed in to change notification settings - Fork 2
scale to zero: photoreal, deforum, sadtalker and scale to one: wav2lip #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change updates deployment documentation and configuration files to reflect adjustments in replica counts and autoscaling settings for several services. In the documentation, replica counts for multiple deployments were reduced, with some set to zero, and new model IDs were added to certain deployments. In the YAML configuration, the minimum replica count ( Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
chart/model-values.yaml (3)
57-61: Scale-to-zero for heavyweight diffusion models – check cold-start SLA
minReplicaCounthas been dropped to0for five diffusion deployments.
Cold-starting a 35–50 Gi GPU image that has to pull >10 GB of model weights can add several minutes of latency, especially when an idle node has to be re-provisioned.• Verify that KEDA’s queue-length trigger plus your client time-outs can tolerate this extra spin-up time.
• If not, keep one warm replica or add a pre-warm job/lifecycle hook.Also applies to: 73-75, 88-90, 101-104, 118-121
130-136: deforum-sd-1 now scales to zero – very long start-up expected
deforum-sd-1loads Deforum + Stable Diffusion + animation tool-chain. Historical data shows first-frame latency >6 min from cold. Confirm that the product flow (e.g., webhook ACK time) is bumped accordingly or that a warming strategy is in place.
171-187: Many speech & embedding services also moved to minReplicaCount = 0While these models are lighter than SD, Whisper‐large and Embedding transformers still take 60-90 s to load on A100s. Make sure:
- The HPA/KEDA cooldown period is long enough to prevent thrashing.
- Any “sync” or “streaming” endpoints that expect <5 s cold latency are routed to the few services that still keep one replica.
Also applies to: 198-215, 224-229, 238-242, 265-270, 277-282, 299-305
DEPLOYMENTS.md (1)
27-28: Lint warning – wrap bare URLs to satisfy MD034
markdownlintflagged bare URLs in the table. Wrap them, e.g.:<https://objectstore.e2enetworks.net/indic-asr-public/checkpoints/...>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
DEPLOYMENTS.md(1 hunks)chart/model-values.yaml(6 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
DEPLOYMENTS.md
27-27: Bare URL used
null
(MD034, no-bare-urls)
28-28: Bare URL used
null
(MD034, no-bare-urls)
🔇 Additional comments (2)
chart/model-values.yaml (1)
317-322: retro-sadtalker & retro-wav2lip keep 1 warm replica – good compromiseRetaining a single replica for these real-time lip-sync services protects UX while still allowing scale-out. 👍
Also applies to: 332-336
DEPLOYMENTS.md (1)
8-18: Docs accurately reflect new replica countsThe table mirrors the YAML changes (many replicas → 0). Looks in-sync.
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.