Google's Gemini 2.5 Pro 06-05 Gets Worse in June Update as Secret Kingfall Model Accidentally Goes Public

Google's AI Stumble: Gemini 2.5 Pro 06-05 Sparks Backlash as Mystery 'Kingfall' Model Emerges

Google's artificial intelligence division finds itself navigating turbulent waters as its latest Gemini 2.5 Pro 06-05 release has triggered widespread developer criticism for delivering inferior performance compared to its predecessor, while a mysteriously leaked model codenamed "Kingfall" has emerged as a potential game-changer that could reshape the company's AI strategy.

The June 5, 2025 release of Gemini 2.5 Pro Preview 06-05 has drawn sharp criticism from the developer community, with comprehensive benchmarking data revealing significant performance regressions across multiple critical metrics compared to the May 6 model it replaced. According to LiveBench.ai evaluations, the newer model's global average score dropped from 71.99 to 69.39, marking a concerning decline in overall capabilities.

Did you know? According to Google's PR release, Google’s Gemini 2.5 Pro Preview (released June 5, 2025) boasts state-of-the-art performance across top industry benchmarks, with standout results on LMArena (1470 Elo) and Aider Polyglot (86.2%). It’s hailed as Google’s most intelligent model yet, featuring innovations like “thinking budgets” for developers. Yet, despite these impressive metrics, many users (including us) discovered that the model underdelivers in real-world use—citing issues with coding reliability, context retention, and response quality. This highlights a recurring tension in AI: leading benchmark scores don’t always translate to satisfying user experiences. In addition, it provides us great insight into what benchmarks have lost effectiveness.

When Upgrades Become Downgrades: The Numbers Tell a Sobering Story

The performance degradation spans several key areas that matter most to enterprise users and developers. Most dramatically, agentic coding capabilities plummeted from 30.00 to a mere 13.33 – a catastrophic 56% decline that has left many automated coding workflows broken. Mathematics performance similarly declined from 88.63 to 83.33, while instruction following, a cornerstone of practical AI deployment, fell from 83.50 to 78.54.

"The regression in agentic coding is particularly concerning because it affects the model's ability to handle complex, multi-step programming tasks that are essential for enterprise applications," noted one AI researcher.

The technical community has been particularly vocal about quality issues beyond the raw numbers. Developers report increased hallucinations in code output, with the model inventing non-existent functions and variables more frequently than before. Multi-file coding projects and incremental code modifications have become notably less reliable, forcing many teams to revert to the earlier May version.

Developer Revolt: Community Pushback Intensifies

User feedback has coalesced around several critical pain points that extend beyond performance metrics. The model's context retention capabilities have deteriorated markedly, with frequent failures to maintain conversation history or remember user instructions across longer sessions. This instability has proven particularly problematic for complex workflows requiring sustained attention to detail.

The much-touted "Max Thinking" mode, positioned as an enhanced reasoning capability, has failed to meet expectations. Users describe it as slower without delivering meaningfully better results, with some reporting that it actually produces less accurate outputs than the standard mode.

"The new version feels verbose but shallow," observed one enterprise AI consultant. "It produces more words but provides fewer actionable insights, which is exactly the opposite of what enterprise clients need."

Interface changes have further frustrated the user base, with key features buried in nested menus and reduced customization options hampering established workflows. The combination of performance regression and usability challenges has created what some describe as a crisis of confidence in Google's AI development trajectory.

The Kingfall Enigma: Accidental Glimpse of Google's Future

Amid this controversy, a 20-minute accidental exposure of a confidential Google model labeled "Kingfall" through Google AI Studio in early June has captured the AI community's imagination. The brief leak, whether intentional marketing or genuine error, revealed capabilities that starkly contrast with Gemini 2.5 Pro's current limitations.

Kingfall demonstrates sophisticated multimodal abilities, processing text, images, and files with a context window of approximately 65,000 tokens. Its most intriguing feature is a configurable "thinking budget" that enables resource-intensive, step-by-step reasoning for complex problems. Early testers reported exceptional performance in coding tasks, including generating sophisticated applications like functional Minecraft clones in single HTML files.

The model's SVG generation capabilities reportedly surpass even Anthropic's Claude 4, while its debugging and multi-step logic handling has drawn praise from the limited group who accessed it during the brief exposure. These capabilities suggest Kingfall represents either the complete Gemini 2.5 Pro release or an entirely new enterprise-focused variant.

Strategic Implications: Google's AI Chess Game

The timing of these developments carries significant strategic weight as the AI landscape becomes increasingly competitive. Google appears caught between the need to rapidly iterate and the imperative to maintain quality, a balance that has clearly shifted unfavorably with the June 5 release.

Industry analysts suggest the Kingfall leak may represent Google's response to OpenAI's anticipated o3 Pro release, positioning advanced reasoning capabilities as a key differentiator in the enterprise market. The model's architecture suggests a deliberate focus on automation and business process optimization, areas where demand continues to surge.

However, the current Gemini 2.5 Pro regression raises questions about Google's development and testing processes. The significant performance decline across multiple metrics suggests either inadequate validation procedures or deliberate trade-offs that have proven unpopular with users.

Market Dynamics and Competitive Positioning

The AI model landscape has become increasingly fragmented, with different providers excelling in specific domains. Google's current predicament highlights the challenges of maintaining broad competency while pushing boundaries in emerging capabilities like advanced reasoning and multimodal processing.

The enterprise AI market, valued at over $150 billion annually and growing at 40% year-over-year, shows particular sensitivity to reliability and consistency. Google's reputation for unexpected model updates and endpoint changes has already created wariness among enterprise customers, making the current regression particularly damaging.

Investment Outlook: Navigating AI Market Volatility

The divergent trajectories of Gemini 2.5 Pro and Kingfall present a complex investment landscape for AI-focused portfolios. While Google's immediate misstep with Gemini 2.5 Pro may pressure near-term performance, the advanced capabilities demonstrated by Kingfall suggest potential for significant market disruption if properly executed.

Investors may consider that Google's vast computational infrastructure and research capabilities position it to recover from this setback relatively quickly. Historical patterns suggest that major AI providers often experience temporary regressions before achieving breakthrough improvements, making current weakness potentially attractive for long-term positions.

The enterprise AI market's continued expansion, coupled with increasing demand for multimodal and reasoning-capable models, may favor providers who can deliver reliable, advanced capabilities. Google's challenge lies in reconciling the innovation demonstrated by Kingfall with the stability required for enterprise adoption.

Market participants should monitor Google's response timeline to current criticism, the official announcement strategy for Kingfall, and any changes to development or testing procedures. The company's ability to address current concerns while capitalizing on Kingfall's potential may determine its competitive position in the rapidly evolving AI landscape.

Past performance in AI model development does not guarantee future results, and investors should consult financial advisors regarding AI sector exposure given the technology's inherent volatility and rapid evolution.