Computer use is now a native tool in Gemini 3.5 Flash, so developers can build agents that see a screen and take action across browser, mobile, and desktop. Google paired it with two enterprise safeguards against prompt injection.
2026-06-24 · by Moe Ameen
On June 24, 2026, Google announced that computer use is now a built-in tool in Gemini 3.5 Flash, calling it the company's "best performance yet for agentic computer use tasks." Until now the capability lived in a separate, standalone Gemini 2.5 Computer Use model; folding it into the main Flash model means developers can build agents that "see, reason and take action across browser, mobile and desktop environments" without wiring up a second model.
The way it works is visual: rather than calling clean APIs, a computer-use agent perceives a graphical interface from screenshots and decides what to click, type, scroll, or select — the same way a person operates an app they have never seen documented. Google points the capability at long-horizon work like continuous software testing and knowledge work across professional applications, where a task takes many sequential steps. Google says the model is tuned for exactly these multi-step, planning-heavy jobs. On the OSWorld computer-use benchmark, Google included results in its benchmark chart for the release; independent recaps put Gemini 3.5 Flash among the top models on that test, roughly tied with other current frontier models, though benchmark numbers vary by source so treat any single figure as a snapshot.
Developers can access computer use through the Gemini API and the Gemini Enterprise Agent Platform, with a hosted demo environment run by Browserbase. Because an agent that can click anything is also an agent that can be tricked, Google paired the launch with safety work: targeted adversarial training for computer use, plus two optional enterprise safeguards — one that requires explicit user confirmation before sensitive or irreversible actions, and one that automatically stops a task if it detects an indirect prompt injection. Google still recommends running these agents inside secure sandboxes with human-in-the-loop verification and strict access controls.
It is tempting to imagine pointing a computer-use agent at your social tools and letting it run your content. In practice, an agent that operates by screenshot is the brittle way to automate a pipeline — it logs into each platform, hunts for the upload button, and re-reads the screen every time, which breaks the moment an interface shifts or a prompt-injection guard trips. Kompozy automates the same outcome the durable way. Its autopilot runs server-side on Trigger.dev workers against platform APIs, not by clicking a rendered page, so a generate-schedule-publish run completes unattended and survives a closed tab or a redesigned button. You approve a batch; the engine renders, schedules, and fans it across all nine connected platforms without an agent driving a mouse.
There is a content play in the news too, and it is a high-intent one. "AI can now operate your computer" is exactly the topic your audience is searching this week, and a clear, grounded take cuts through the hype. Drop your angle — what computer-use agents are good for, where they break, why screen-driving is not the same as a real pipeline — into Kompozy as a source, and the engine turns that single point of view into a blog explainer, a carousel breaking down how the agent works and its prompt-injection risks, short captioned clips, and platform-native posts in your own voice through the Persona Brief, then schedules and publishes the set in one pass. Being early and specific on an agentic-AI story is how one take becomes a week of content.
It is a built-in tool, announced June 24, 2026, that lets a Gemini 3.5 Flash agent perceive a graphical interface from screenshots and take action — clicking, typing, scrolling, and selecting — across browser, mobile, and desktop environments. It moves the capability from a standalone Gemini 2.5 Computer Use model into the main Flash model.
A chatbot returns text. A computer-use agent reads a screen and operates real applications to complete multi-step tasks like software testing or filling out a workflow. It acts rather than just answers, which is more powerful and also riskier — Google ships prompt-injection safeguards precisely because an agent that can click anything can be steered by malicious content it reads.
Not as a finished product. It is a developer capability accessed through the Gemini API and Gemini Enterprise Agent Platform — you would have to build and supervise the agent yourself, and a screen-driving agent is fragile when interfaces change. For reliable, unattended generation and publishing across platforms, a purpose-built engine like Kompozy automates the outcome through platform APIs instead.
Treat it carefully. Google added targeted adversarial training and two optional enterprise safeguards — explicit confirmation before sensitive actions and automatic task stops on detected prompt injection — and still recommends secure sandboxing, human-in-the-loop verification, and strict access controls. The safeguards exist because autonomous agents operating live apps can be manipulated.