Google's latest attempt to fix token quotas is here: Say hello to Gemini 3.5 Flash Low

SiliconFeed EditorialMay 25, 2026

Google Gemini AI Software Engineering Token Quotas Antigravity

Sections and tags — in the Topics menu Search the feed

At a glance:

Google introduces Gemini 3.5 Flash (Low) model that uses 45% fewer tokens than the Medium variant
Original Gemini 3.5 Flash has been renamed to Gemini 3.5 Flash (Medium)
Google has reset token quotas across all Gemini plans (paid and free) to support software engineering tasks

The New Model: Gemini 3.5 Flash (Low)

Google has unveiled a new variant of its popular Gemini 3.5 Flash model called Gemini 3.5 Flash (Low), designed specifically to address token usage concerns in simple tasks. This comes after users reported frustration with tighter limits in Google's Antigravity software development platform. The new Low variant generates approximately 45% fewer tokens than the original model, which has now been renamed to Gemini 3.5 Flash (Medium). According to Google, this adjustment aims to optimize performance for straightforward tasks while maintaining efficiency in more complex operations.

Addressing User Concerns

The introduction of Gemini 3.5 Flash (Low) directly responds to user complaints about excessive token consumption in Antigravity. Varun Mohan, Director at Google DeepMind working on Antigravity, acknowledged that the platform was using too many tokens for simple tasks, prompting this targeted solution. Google had previously attempted to address these concerns by increasing Antigravity's limits by 9x across two separate increases, but users continued to experience issues. The new model represents a more fundamental approach to the problem by optimizing the model itself rather than just adjusting quota limits.

Technical Details and Performance

Gemini 3.5 Flash (Low) adjusts the model's effort level specifically for simpler tasks without compromising on system prompts or context compaction. Google emphasizes that users should still use the higher-effort level for complex tasks to maintain optimal performance. The company claims that the new Low variant generally outperforms the older Gemini 3 Flash (now referred to as the "High" variant) on software engineering (SWE) tasks, indicating that reduced token usage doesn't necessarily mean reduced quality for appropriate use cases.

Quota Adjustments Across All Plans

Alongside the introduction of the new model, Google has reset the Gemini token quotas across all its plans, both paid and free. This move ensures that users have sufficient tokens available for their software engineering work, particularly in the context of Antigravity. The quota reset represents a comprehensive approach to addressing user concerns, combining both model optimization and resource availability improvements. Google appears committed to supporting developers and other users who rely on its AI tools for their work.

Internal Testing and Future Improvements

When questioned about the testing process for the new model, with some users suggesting Google was "testing on its users," Varun Mohan acknowledged this feeling but clarified that the team has been using the model internally for an extended period before public release. Google made Gemini 3.5 Flash highly effective at solving complex tasks but apparently overlooked token usage optimization for simpler tasks. Mohan has promised further improvements in the future, indicating that this iteration represents an ongoing refinement process rather than a final solution.

Context and Availability

It's important to note that these changes were specifically discussed in the context of Antigravity, Google's AI-centric software development platform. As of now, Gemini 3.5 Flash (Low) isn't listed as a separate model within the general Gemini app and may remain exclusive to development-focused tools. This suggests that the token usage issues might be more pronounced in specialized development environments where simple tasks are frequent but still consume significant resources. The distinction between model variants appears designed to address specific use cases rather than being a universal solution across all applications.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

What is Gemini 3.5 Flash (Low) and how does it differ from the original model?

Gemini 3.5 Flash (Low) is a new variant of Google's AI model that uses approximately 45% fewer tokens than the original model, which has been renamed to Gemini 3.5 Flash (Medium). The Low variant is specifically designed for simpler tasks while maintaining performance for software engineering work.

Why did Google introduce this new model?

Google introduced Gemini 3.5 Flash (Low) in response to user complaints about excessive token consumption in Antigravity, their AI-centric software development platform. Users reported that even simple tasks were consuming too many tokens, despite previous quota increases.

Is Gemini 3.5 Flash (Low) available to all Gemini users?

As of now, Gemini 3.5 Flash (Low) appears to be focused on the Antigravity platform and isn't listed as a separate model in the general Gemini app. It may remain exclusive to development-focused tools rather than being universally available across all Gemini applications.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article