Increasing capacity and reliability #12311
Replies: 16 comments 18 replies
-
|
This sounds really great @ryanjsalva , will those error improves address the countTokens issues? |
Beta Was this translation helpful? Give feedback.
-
|
Will this improve the actual requests we users who use Google Authentication (Free) will receive for the 2.5 PRO model? |
Beta Was this translation helpful? Give feedback.
-
How is the level of complexity defined, and by whom? Based on my experience, Gemini itself cannot distinguish between high and low complexity—everything seems simple to it. However, due to its overconfidence, it tends to underestimate the complexity of my code logic, resulting in low-quality responses. |
Beta Was this translation helpful? Give feedback.
-
|
Can't you guys just add Qwen or Deepseek as additional models. Those open-source models are WAY ahead of Gemini in intelligence. I mean Qwen CLI (which is a fork of Gemini CLI) is better in every way because of Qwen model. Those two models are available in Vertex so I would think there are not major technical barriers to implement to be able to add them. For coding, Gemini flash even pro are both dumb compared to open-source ones and not come close to GPT/Claude. |
Beta Was this translation helpful? Give feedback.
-
|
Thank you @ryanjsalva for the transparency and the commitment to improving reliability. While the outlined measures are encouraging, I'd like to respectfully raise some concerns regarding the proposed intelligent routing system and suggest an alternative approach. Technical Concerns: The automatic routing to Gemini 2.5 Flash based on "complexity" assessment raises significant technical questions. As @wangbiye astutely pointed out, defining and detecting complexity is inherently challenging. In practice, LLMs often exhibit overconfidence and may underestimate task complexity, particularly with:
This could result in degraded output quality precisely when users need the most capable model. Additionally, @rcleveng's question about countTokens issues highlights how error handling improvements need to address the full spectrum of API reliability concerns, not just routing. Ethical and Philosophical Dimensions: @mlik-sudo's comment about user sovereignty touches on a fundamental principle: users should retain agency over the tools they use. Automatic model switching, even with good intentions, raises concerns about:
The 429 errors mentioned by @Neinndall (hitting limits at just 50-75 requests) suggest that capacity constraints are pushing toward rationing rather than true scaling. While understandable, this creates a conflict between system optimization and user needs. Proposed Solution: User-Controlled Model Selection Rather than intelligent routing, I propose implementing explicit user control with smart defaults:
This approach balances your capacity goals with user sovereignty. Users who trust the system can benefit from automatic optimization, while those with specialized needs retain control. Conclusion: I deeply appreciate the engineering effort going into capacity expansion and error handling improvements—these are absolutely critical. However, I encourage the team to reconsider automatic model switching in favor of user-empowered tools. This builds trust, respects expertise, and ultimately creates a more sustainable relationship between platform and community. Thank you for considering this feedback, and for continuing to engage with the community so openly. 🙏 |
Beta Was this translation helpful? Give feedback.
-
|
Gemini CLI should offer a multi-phase workflow where, at every step, users choose the AI mode: full automation (Auto), speed (Gemini Flash), or depth (Gemini Pro). Customization at every stage—user choice becomes the heart of the process. |
Beta Was this translation helpful? Give feedback.
-
|
I think a straightforward and open approach to usage and capacity such as that provided by Anthropic in their Claude tools would be the ideal solution. Rather than swapping models or showing errors, Gemini should provide a robust usage meter with clear information about daily and monthly quotas together with information about when these are reset. This would go a long way to solving the issues I currently experience with Gemini. |
Beta Was this translation helpful? Give feedback.
-
|
Hey, y'all. 👋 Thanks again for your patience and constructive feedback. The debate over model choice and intelligent routing is an important one. In fact, it's important enough to merit a dedicated thread which I'll open next week. If you'll forgive me, I want to focus this afternoon's update on the progress made since yesterday so everyone can remain informed. 🐿️ Shipped
⏭️ Next
Our plan is to monitor traffic on Monday (when congestion is at it's peak). Watch this discussion for another update on Tuesday. |
Beta Was this translation helpful? Give feedback.
-
|
Discussion about the Codebase Investigator mentioned that the prompt/context could influence when this was invoked. Is model routing influenced by prompt phrases such as "investigate this with deep thinking"? |
Beta Was this translation helpful? Give feedback.
-
|
Experience of the last few hours has been good. Smooth, no 429 errors. Since someone appears to be listening to messages posted here ... What is the practical quota, that is the Pro quota? Documentation says 1500 requests a day, but actual user experience is nothing like this. IT seems that this is 1500 requests divided in some unknown way between different models. Obviously not every request needs Pro, auto routing is great (hopefully), but when Pro runs out, gemini-cli is not very useful. For me, the allowance of Pro queries is hugely more impactful than the number of Flash queries. The description of various paid options is consistent in terminology. We read "free", "higher" and "highest", but what these mean in terms of Pro requests is a mystery. Also, the only plan I can access with "highest", what ever that is, is AI Ultra from a personal account. Why can't I access "highest" without paying for a big bundle of things I don't want, like animation generation and Youtube without ads? Like, why is there only one level of Developer subscription? There are hints that Google folk believe the quota has been set so high that it is practically inexhaustible. That is clearly not true. I would pay for a higher plan, but I think Ultra at around ten time the cost (the trial discounts the first three months to about five times the cost of Pro) is too much. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
I'm hoping this is related to a frustrating experience I've been having where Gemini CLI just spins indefinitely when I ask it to review a diff on a project: #11765. I just re-tested on 0.11.3 and I'm getting the same experience. My prompt:
It does ask me to approve running that shell command, and I see the diff in the terminal. The diff is under 500 lines. But then I wait indefinitely at:
I'm authenticated with Google and I am a Premium subscriber. |
Beta Was this translation helpful? Give feedback.
-
|
npm install -g @google/gemini-cli |
Beta Was this translation helpful? Give feedback.
-
|
I need a fixed configuration method that uses pro mode. The auto mode is utterly foolish. I demanded it to analyze the existing code and make modifications according to the new plan, but it did nothing and simply told me "It's already done." |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
For the past two days, I've been experiencing constant API errors: Error 400. And the maximum requests we have with the 2.5 PRO model have dropped again. |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, friends. 👋
First, thanks y’all for continuing to shower Gemini CLI with your ideas, issues, code contributions, and new extensions. The sheer volume of developers building with Gemini CLI has exceeded our wildest ambitions. Sincerely… thank you.
Unfortunately, as demand has increased, many of you have also reported intermittent capacity-related errors. I know toolchain reliability is critical to achieving flow state. High error rates are frustrating, and they do not reflect the experience we aim to deliver. I’m truly sorry. 😞 There is, however, light on the horizon. I’m writing to outline our plan to both (a) quickly resolve the congestion, and (b) build a more resilient platform for the years ahead.
Focusing on reliability and long-term stability
To ensure a stable, snappy experience for everyone, the Gemini CLI maintainers will immediately:
Through these changes, we aim to deliver significantly lower error rates. And, should you ever suffer an error, we hope the improved error messages will light the way to a quick resolution. Thank you for your patience and for continuing to build incredible things on our platform. Keep watching this discussion post for updates. 👀
Beta Was this translation helpful? Give feedback.
All reactions