Google's latest attempt to fix token quotas is here: Say hello to Gemini 3.5 Flash Low
At a glance:
- Google introduces Gemini 3.5 Flash (Low) model that uses 45% fewer tokens than the Medium variant
- Original Gemini 3.5 Flash has been renamed to Gemini 3.5 Flash (Medium)
- Google has reset token quotas across all Gemini plans (paid and free) to support software engineering tasks
The New Model: Gemini 3.5 Flash (Low)
Google has unveiled a new variant of its popular Gemini 3.5 Flash model called Gemini 3.5 Flash (Low), designed specifically to address token usage concerns in simple tasks. This comes after users reported frustration with tighter limits in Google's Antigravity software development platform. The new Low variant generates approximately 45% fewer tokens than the original model, which has now been renamed to Gemini 3.5 Flash (Medium). According to Google, this adjustment aims to optimize performance for straightforward tasks while maintaining efficiency in more complex operations.
Addressing User Concerns
The introduction of Gemini 3.5 Flash (Low) directly responds to user complaints about excessive token consumption in Antigravity. Varun Mohan, Director at Google DeepMind working on Antigravity, acknowledged that the platform was using too many tokens for simple tasks, prompting this targeted solution. Google had previously attempted to address these concerns by increasing Antigravity's limits by 9x across two separate increases, but users continued to experience issues. The new model represents a more fundamental approach to the problem by optimizing the model itself rather than just adjusting quota limits.
Technical Details and Performance
Gemini 3.5 Flash (Low) adjusts the model's effort level specifically for simpler tasks without compromising on system prompts or context compaction. Google emphasizes that users should still use the higher-effort level for complex tasks to maintain optimal performance. The company claims that the new Low variant generally outperforms the older Gemini 3 Flash (now referred to as the "High" variant) on software engineering (SWE) tasks, indicating that reduced token usage doesn't necessarily mean reduced quality for appropriate use cases.
Quota Adjustments Across All Plans
Alongside the introduction of the new model, Google has reset the Gemini token quotas across all its plans, both paid and free. This move ensures that users have sufficient tokens available for their software engineering work, particularly in the context of Antigravity. The quota reset represents a comprehensive approach to addressing user concerns, combining both model optimization and resource availability improvements. Google appears committed to supporting developers and other users who rely on its AI tools for their work.
Internal Testing and Future Improvements
When questioned about the testing process for the new model, with some users suggesting Google was "testing on its users," Varun Mohan acknowledged this feeling but clarified that the team has been using the model internally for an extended period before public release. Google made Gemini 3.5 Flash highly effective at solving complex tasks but apparently overlooked token usage optimization for simpler tasks. Mohan has promised further improvements in the future, indicating that this iteration represents an ongoing refinement process rather than a final solution.
Context and Availability
It's important to note that these changes were specifically discussed in the context of Antigravity, Google's AI-centric software development platform. As of now, Gemini 3.5 Flash (Low) isn't listed as a separate model within the general Gemini app and may remain exclusive to development-focused tools. This suggests that the token usage issues might be more pronounced in specialized development environments where simple tasks are frequent but still consume significant resources. The distinction between model variants appears designed to address specific use cases rather than being a universal solution across all applications.
FAQ
What is Gemini 3.5 Flash (Low) and how does it differ from the original model?
Why did Google introduce this new model?
Is Gemini 3.5 Flash (Low) available to all Gemini users?
More in the feed
Prepared by the editorial stack from public data and external sources.
Original article