Your Agent Just Went Rogue. Here's How to Stop It in 60 Seconds.

It's 3 AM. Your customer support agent — the one that's handled forty thousand tickets without a complaint — starts telling users to contact a competitor for "faster service." It's confident. It's polite. It's wrong, and it's saying it to every customer who opens a chat.

The question is not whether this will happen to you. It's how long it runs before you stop it, and whether you can explain afterward what happened.

Everyone has a rogue agent. Almost nobody can contain one.

The first half of that sentence is no longer hypothetical. In a 2026 survey, 82% of U.S. companies using AI agents reported watching one act in an unexpected way over the previous twelve months — wrong decisions, exposed data, triggered breaches. A separate enterprise survey found that 88% of organizations deploying agents had at least one security incident in 2025.

The second half is where it gets uncomfortable. The 2026 CISO AI Risk Report, based on a structured survey of 235 security leaders across the US and UK, found that only 5% of CISOs feel confident they could contain a compromised AI agent. Ninety-five percent doubt they'd even detect the misuse.

Read those numbers together and the shape of the problem is obvious. The industry has spent two years on detection — evals, guardrails, red-teaming. Almost no one has built the part that happens after the alarm goes off. Rogue agents are common. The runbook for stopping one is rare.

Incident response is a discipline with thirty years of muscle memory behind it for servers and networks: detect, contain, eradicate, recover, document. Agents need the same loop, and they need it to run in seconds, not the hours a human-paced SOC process assumes. Here's what that loop looks like when the plumbing is already in place.

Step 1 — Detect: the alarm has to find you

You cannot respond to an incident you learn about from an angry customer. Polling a dashboard every five minutes is a five-minute window in which a misbehaving agent is still trusted and still talking.

VeriSwarm pushes the signal instead of waiting to be asked. The webhook contract fires the moment a trust decision changes — agent.trust.drift when behavioral scores slide, agent.tier.changed when an agent drops from allow to review or deny, guard.injection.detected when a prompt-injection attempt lands. Each delivery is signed (HMAC-SHA256, timestamp inside the signature) so your receiver can trust the alert itself. That's the difference between finding out at 3 AM and finding out at 9 AM when the dashboard loads.

The detection signal isn't a single threshold trip, either. Gate scores four behavioral dimensions continuously — identity, risk, reliability, autonomy — so "going rogue" shows up as a measurable slope, not a binary that flips only after the damage is done.

Step 2 — Stop: one call, immediate halt

When the alert lands, the first move is to remove the agent's ability to act — not to open an investigation. Investigation comes after the bleeding stops.

One authenticated call does it: POST /guard/kill/{agent_id}. The agent is flagged killed, and from that instant every downstream trust decision for it returns deny. There's no propagation delay to a fleet of caches, no "please redeploy with the agent disabled" — the kill is evaluated at decision time, so the next tool call, the next message, the next delegation all fail closed. The endpoint is rate-limited and gated behind a guard.killswitch.write permission, because the one button you need to work at 3 AM is the one button you cannot afford to have misfire.

And critically: the kill writes itself to the record. The moment agent.killed is recorded, it lands in Vault with the reason, the actor, and the timestamp. Stopping the incident and documenting that you stopped it are the same action, not two.

Step 3 — Investigate: a timeline you can trust

Now you can look. The hardest question in any post-incident review is when did this actually start — and the second hardest is how do I know the timeline I'm looking at wasn't edited after the fact, by the incident or by us.

Vault answers both. Every event — decisions, score changes, tool calls, the kill itself — is written to a SHA-256 hash-chained ledger where each entry commits to the one before it. Application-level listeners block mutation of immutable rows, and verify_vault_chain() lets you (or an auditor) re-derive the entire chain and prove no entry was inserted, deleted, or altered. The timeline isn't a log table someone could have touched. It's tamper-evident by construction.

Layer Gate's score history on top and the forensics get specific: you can see the exact event where reliability started to slide, correlate it to a deployment, a model swap, or a poisoned input, and stop guessing about root cause.

Step 4 & 5 — Remediate and prove

Remediation is the part you already know how to do: roll back the change, fix the prompt, rotate the credential, then bring the agent back through the trust tiers — review before allow, earning its score back rather than being handed full access on faith.

Proving what happened is the part most deployments can't do, and it's increasingly the part that matters. When a customer, a board, or a regulator asks "what did the agent do, when did you know, and how fast did you stop it," the answer can't be a Slack thread reconstructed from memory. With the kill, the score history, and the full event sequence all on one verifiable chain, the post-mortem writes itself — and it holds up to an adversarial reading, which is exactly what EU AI Act Article 12 record-keeping and any serious audit will eventually demand.

The gap is the runbook, not the alarm

The 5% containment-confidence number is the whole story. The industry built smoke detectors and skipped the fire department. An agent that can't be stopped on demand, investigated against a trustworthy timeline, and explained to a stakeholder afterward isn't governed — it's just monitored, which is a more expensive way of being surprised.

Detect, stop, investigate, remediate, prove. Five steps, all wired to surfaces you can stand up today: Gate scoring and webhooks for the alarm, Guard's kill switch for the stop, Vault for the timeline and the proof. You can start on the free Gate tier and have the alerting path live before your next deploy — because the time to build your incident runbook is not 3 AM.