Google Unveils AI That Can Browse the Web Like a Human—But the Real Work Is Just Beginning

Gemini 2.5 Computer Use model aims to take over digital busywork, but early users should brace for hiccups.

Google DeepMind has rolled out its Gemini 2.5 Computer Use model, an AI system that can click, type, and scroll through websites and apps much like a person does. The model, available now in preview through the Gemini API, marks a big step toward turning AI into a practical assistant that can handle the repetitive digital chores many of us dread.

Here’s how it works: the AI looks at a screenshot of the screen, interprets what the user wants, checks its previous moves, then decides what to do next—click a button, fill out a form, scroll further down. After every action, it gets another screenshot, and the loop continues until the task finishes or the model gets stuck.

Google says Gemini 2.5 hits more than 70 percent accuracy with an average task time of about 225 seconds. On standard benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld, it’s beating out rivals. Inside Google, teams are already using it for user interface testing, Project Mariner, and new features in Search’s AI Mode.

Some early testers are impressed. “Gemini 2.5 Computer Use is far ahead of anything else we’ve tried—50 percent faster and more accurate than competing tools,” said Poke.com, an AI assistant service in Google’s pilot program.

Still, not everyone is ready to celebrate. Our own engineers at CTOL.digital found the system “promising for browser automation and testing” but also “early, web-first, and finicky when tasks get complicated.” Their verdict: it’s useful now but needs big improvements in speed and reliability before it can be truly transformative.

Gemini 2.5 Computer Use (googleapis.com)

Safety by Design—Or Just for Show?

One thing that sets Google’s model apart is the way it approaches safety. Every action goes through a safety service before execution, which helps guard against three major risks: misuse by users, the model itself doing something unexpected, or malicious prompts hidden in websites.

Developers can even require user confirmation before risky steps like purchases, CAPTCHA bypasses, or controlling sensitive systems. Unlike rivals that bolt on filters after the fact, Google trained these safeguards into the model itself.

That could prove to be a big advantage. Our analysts noted, “Per-action reviews and system-level policies are the right defaults. This won’t block every prompt injection, but it makes enterprise adoption far smoother, especially in regulated industries.”

If Google turns this reviewer into a standalone, customizable service—letting companies plug in their own rules and approvals—it could give the tech giant a real market edge.

What It Can Do—and What It Can’t

Right now, Gemini 2.5 shines in web browsers. It shows promise with mobile apps, but desktop-level operating system control remains out of reach. That may actually be by design.

“Most valuable automations sit behind web logins—things like forms, admin consoles, and SaaS tools,” our team observed. “Faster, tighter loops on the web beat clunky OS control for the majority of enterprise workflows.”

The sweet spots seem to be browser-based automation, UI testing, structured site navigation, and data entry. In fact, Google’s payments team says using Gemini 2.5 as a backup for fragile end-to-end tests cut manual recovery time by days.

But limits are clear. Tasks still take minutes, not seconds, which rules out high-volume customer support. Performance drops off on open-ended problems or puzzle-like challenges. And in day-to-day use, the preview version varies enough that developers need retries and human oversight.

Business Impact and the Bigger Picture

Our analysis suggests the real winners won’t be “AI browser driving” startups. Those look more like features than full-fledged companies. The bigger opportunity lies in building vertical solutions—specialized copilots for regulated industries, resilient testing infrastructure, security tools, and performance monitoring platforms.

“Durable companies will mix native APIs, UI driving as fallback, structured workflows, built-in safety checks, and human-friendly review tools,” our team explained. “The moat isn’t just the agent—it’s process knowledge, integrations, and data.”

Competition isn’t standing still. Anthropic is pushing broader desktop automation through Claude. Open-source projects are multiplying, giving developers plenty of alternatives. Smart businesses will design systems flexible enough to swap providers as the tech matures, rather than betting on just one.

The Bottom Line

Gemini 2.5 Computer Use is progress, not magic. It sets a higher bar for how AI navigates the digital world, with stronger safety features and competitive benchmarks. But it’s still infrastructure—useful for automating routine work, not a sci-fi agent that can handle anything you throw at it.

For now, companies should aim it at tightly defined, high-value workflows with clear success metrics and backup plans. The technology will get faster and smarter in time. The real decision is whether to adopt early and live with today’s rough edges, or wait for the smoother ride that’s bound to come as Google and its rivals push the frontier forward.

NOT INVESTMENT ADVICE

Google Unveils AI That Can Browse the Web Like a Human—But the Real Work Is Just Beginning