// AI NEWS · FEATURE

Google Builds Computer Use Into Gemini 3.5 Flash, Turning the Model Into an Agent That Clicks

Computer use is now a native tool in Gemini 3.5 Flash, so developers can build agents that see a screen and take action across browser, mobile, and desktop. Google paired it with two enterprise safeguards against prompt injection.

2026-06-24 · by Moe Ameen

What happened

On June 24, 2026, Google announced that computer use is now a built-in tool in Gemini 3.5 Flash, calling it the company's "best performance yet for agentic computer use tasks." Until now the capability lived in a separate, standalone Gemini 2.5 Computer Use model; folding it into the main Flash model means developers can build agents that "see, reason and take action across browser, mobile and desktop environments" without wiring up a second model.

The way it works is visual: rather than calling clean APIs, a computer-use agent perceives a graphical interface from screenshots and decides what to click, type, scroll, or select — the same way a person operates an app they have never seen documented. Google points the capability at long-horizon work like continuous software testing and knowledge work across professional applications, where a task takes many sequential steps. Google says the model is tuned for exactly these multi-step, planning-heavy jobs. On the OSWorld computer-use benchmark, Google included results in its benchmark chart for the release; independent recaps put Gemini 3.5 Flash among the top models on that test, roughly tied with other current frontier models, though benchmark numbers vary by source so treat any single figure as a snapshot.

Developers can access computer use through the Gemini API and the Gemini Enterprise Agent Platform, with a hosted demo environment run by Browserbase. Because an agent that can click anything is also an agent that can be tricked, Google paired the launch with safety work: targeted adversarial training for computer use, plus two optional enterprise safeguards — one that requires explicit user confirmation before sensitive or irreversible actions, and one that automatically stops a task if it detects an indirect prompt injection. Google still recommends running these agents inside secure sandboxes with human-in-the-loop verification and strict access controls.

Why it matters for creators

A model that operates real apps by screenshot is a different class of tool from a chatbot. It can, in principle, run the manual clicking that fills a creator's week — uploading, tagging, scheduling — but it does it by driving a GUI, which is slower and more fragile than a purpose-built pipeline.
Screen-driving agents break when an interface changes. A button moves or a layout shifts and the agent gets lost, so for repeatable production work a deterministic system beats one that re-reads the screen every run.
Google shipping prompt-injection safeguards as a headline feature is the tell: handing an autonomous agent the keys to your accounts is genuinely risky, and "require confirmation for sensitive actions" exists because these agents can be steered by malicious content they read on a page.
It is a developer capability, not a finished creator app. There is no consumer button that turns this into your social-media manager — you would build the agent yourself, then babysit it.
The direction of travel is clear: AI is moving from drafting text to taking actions. For creators the practical question becomes which actions you actually want automated, and how reliably, not whether an agent can technically click the button.

How to act on this with Kompozy

It is tempting to imagine pointing a computer-use agent at your social tools and letting it run your content. In practice, an agent that operates by screenshot is the brittle way to automate a pipeline — it logs into each platform, hunts for the upload button, and re-reads the screen every time, which breaks the moment an interface shifts or a prompt-injection guard trips. Kompozy automates the same outcome the durable way. Its autopilot runs server-side on Trigger.dev workers against platform APIs, not by clicking a rendered page, so a generate-schedule-publish run completes unattended and survives a closed tab or a redesigned button. You approve a batch; the engine renders, schedules, and fans it across all nine connected platforms without an agent driving a mouse.

There is a content play in the news too, and it is a high-intent one. "AI can now operate your computer" is exactly the topic your audience is searching this week, and a clear, grounded take cuts through the hype. Drop your angle — what computer-use agents are good for, where they break, why screen-driving is not the same as a real pipeline — into Kompozy as a source, and the engine turns that single point of view into a blog explainer, a carousel breaking down how the agent works and its prompt-injection risks, short captioned clips, and platform-native posts in your own voice through the Persona Brief, then schedules and publishes the set in one pass. Being early and specific on an agentic-AI story is how one take becomes a week of content.

Quick takeaways

Google made computer use a built-in tool in Gemini 3.5 Flash on June 24, 2026, its strongest agentic computer-use capability to date.
Agents see a screen from screenshots and take action across browser, mobile, and desktop; the capability previously lived in a standalone Gemini 2.5 Computer Use model.
It is aimed at long-horizon work like continuous software testing and knowledge work, and ranks among the top models on the OSWorld benchmark in independent recaps.
Access is via the Gemini API and Gemini Enterprise Agent Platform, with a Browserbase-hosted demo; Google added adversarial training and two optional prompt-injection safeguards.
It is a developer capability, not a finished creator app — and screen-driving automation is more fragile than an API-based pipeline like Kompozy's autopilot.

Frequently asked questions

What is computer use in Gemini 3.5 Flash?

It is a built-in tool, announced June 24, 2026, that lets a Gemini 3.5 Flash agent perceive a graphical interface from screenshots and take action — clicking, typing, scrolling, and selecting — across browser, mobile, and desktop environments. It moves the capability from a standalone Gemini 2.5 Computer Use model into the main Flash model.

How is Gemini 3.5 Flash computer use different from a chatbot?

A chatbot returns text. A computer-use agent reads a screen and operates real applications to complete multi-step tasks like software testing or filling out a workflow. It acts rather than just answers, which is more powerful and also riskier — Google ships prompt-injection safeguards precisely because an agent that can click anything can be steered by malicious content it reads.

Can I use Gemini 3.5 Flash computer use to run my social media?

Not as a finished product. It is a developer capability accessed through the Gemini API and Gemini Enterprise Agent Platform — you would have to build and supervise the agent yourself, and a screen-driving agent is fragile when interfaces change. For reliable, unattended generation and publishing across platforms, a purpose-built engine like Kompozy automates the outcome through platform APIs instead.

Is it safe to give a computer-use agent access to my accounts?

Treat it carefully. Google added targeted adversarial training and two optional enterprise safeguards — explicit confirmation before sensitive actions and automatic task stops on detected prompt injection — and still recommends secure sandboxing, human-in-the-loop verification, and strict access controls. The safeguards exist because autonomous agents operating live apps can be manipulated.

Related news

← All AI news · Get started →