The Agent Security Maturity Model: Where Does Your Org Actually Stand?

Published June 2, 2026

Level 0: you don't know how many agents you have. Level 4: every action one of them takes is scored, logged, and provable to a third party. Most organizations deploying agentic AI today are somewhere around Level 1 and quietly convinced they're at Level 3.

That gap is the whole problem. So here is the ladder, honestly described, with the catch that makes most maturity models useless stated up front: a maturity model is only worth the paper it's printed on if the level you claim is the level your instrumentation can prove. Plenty of frameworks released this year grade your intentions — your policies, your committees, your roadmap. This one grades whether the telemetry exists to back the claim.

The starting line is lower than you think

The uncomfortable baseline first. A Cloud Security Alliance survey published in April 2026 found that 82% of enterprises have unknown or unmanaged AI agents operating in their environments — agents nobody is tracking, owned by nobody in particular. Cisco's State of AI Security 2026 put it another way: 83% of organizations planned to deploy agentic AI, but only 29% felt ready to secure it. And Deloitte's 2026 State of AI in the Enterprise found roughly one in five companies has a mature governance model for autonomous agents.

So the median org has agents it can't enumerate, no confidence it can secure them, and an even split on whether it ever shuts a bad one off — Kiteworks' 2026 risk forecast found 60% of organizations can't quickly terminate a misbehaving agent. If you read those numbers and felt a little exposed, good. That's Level 1 talking.

The five rungs

Level 0 — Blind. You have agents. You can't list them. There is no registry, no owner, no inventory. Shadow agents — homegrown scripts, browser extensions, SaaS-embedded copilots, MCP server connections — outnumber the ones on anyone's radar. At Level 0 your incident response plan is "find out it happened from a customer."

Level 1 — Inventoried. Every agent that touches your systems is known and named. Not scored, not secured — just visible. This sounds trivial. Given the 82% number above, it is the single most valuable rung most organizations are missing. You cannot govern a population you can't count.

Level 2 — Scored. Now you're watching behavior over time, not just presence. Each agent carries a live trust signal that moves when it does something risky, unreliable, or out of character. The difference between Level 1 and Level 2 is the difference between a guest list and a credit score — one tells you who's in the room, the other tells you who's about to be a problem.

Level 3 — Controlled. Identity is verified, security is enforced, and you can stop an agent on demand and prove the stop happened. This is the rung the EU AI Act's human-oversight language quietly assumes you've reached. A kill switch nobody has tested and nothing has recorded is a Level 1 control wearing a Level 3 costume.

Level 4 — Provable. Every action is recorded in a tamper-evident ledger a regulator or customer can verify independently. You don't assert your logs are trustworthy — you hand someone the math. At Level 4 an audit is a file export, not a fire drill.

Why the questionnaire models stall at Level 2

The maturity-model genre got crowded in 2026 — SANS, Proofpoint, the CSA's own AISMM, Microsoft's agentic maturity guidance. Most are structured as self-assessment: read the rubric, rate yourself, build a roadmap. Useful for board decks. The failure mode is that self-assessment measures the posture you describe, and posture is cheap. The reason 33% of organizations still lack evidence-quality audit trails (Kiteworks again) isn't that they rated themselves low — it's that the instrumentation to climb past Level 2 was never wired in.

Climbing this ladder isn't a planning exercise. It's a plumbing exercise. Each rung corresponds to a capability that is either emitting telemetry right now or it isn't.

Mapping the rungs to something you can turn on today

This is where VeriSwarm's stack lines up against the ladder one rung at a time — and the first two rungs are free.

Level 0 → Level 1 is Gate event ingestion. Instrument your agents to send events and they appear in a living inventory automatically. Every agent that emits a single event is now counted, named, and visible. The free tier covers unlimited event ingestion, which is to say the most-missed rung in the entire industry costs nothing to climb.
Level 1 → Level 2 is Gate scoring. Those same events feed a four-dimension trust score — identity, risk, reliability, autonomy — that updates as behavior changes. Policy tiers (allow / review / deny) move automatically as the score does. Also free tier.
Level 2 → Level 3 is Guard and Passport. Guard adds PII tokenization, tool-call scanning, and a kill switch that actually halts the agent. Passport verifies identity with portable credentials. Now you can prove who the agent is and stop it cold.
Level 3 → Level 4 is Vault — a hash-chained, tamper-evident ledger of every recorded action, with chain verification an auditor can run themselves and exports that map to compliance frameworks.

Notice the shape. The bottom of the ladder — visibility and behavioral scoring, the rungs 82% of enterprises are missing — is the free tier. You can find out exactly where your org stands without a procurement cycle.

Find your rung

Be honest about Level 0 versus the rung you'd like to claim. If you can't produce a current list of every agent touching your systems in the next five minutes, you're at Level 0, and the fix is one SDK integration away.

Start at the bottom of the ladder where the cost is zero: sign up for the free tier, instrument one agent with the VeriSwarm SDK, and watch it show up scored. That's Level 0 to Level 2 in an afternoon. Levels 3 and 4 are there when you need to prove it to someone who's paid to doubt you.