Technical Guide

Agent Trust Scoring

AI agents aren’t deterministic. The same agent with the same permissions can be reliable on Tuesday and leaking PII on Friday. Agent trust scoring replaces binary access control with behavior-based permissions that update continuously, in production, across five dimensions of agent behavior. This is how VeriSwarm Gate scores agents — and why a score is worth more to an operator than a permission grant.

What agent trust scoring is

Agent trust scoring is a runtime, behavior-weighted rating system that quantifies how much autonomy a specific AI agent should be granted at a given moment, based on its observed behavior across identity, risk, reliability, autonomy, and calibration dimensions. Unlike static access control, the score updates as new events arrive and drives a policy tier — allow, review, or deny — for every decision the agent attempts to make.

Why binary access control isn’t enough

Traditional access control is binary. An agent either has permission or it doesn’t. That model worked when software was deterministic; a function with permission to read a database always read it the same way. Agents aren’t deterministic. The same agent with the same permissions can:

Answer questions accurately for weeks, then start hallucinating.
Follow its boundaries perfectly, then attempt to access data outside its scope.
Handle PII carefully, then start leaking it through tool calls.
Operate within its role, then start trying to escalate its own permissions.

Binary access control captures none of this. An agent that’s been misbehaving for the last 48 hours holds the same access as one that’s been perfect for six months. Trust scoring closes that gap.

The five dimensions Gate scores

A composite trust score that collapses everything into one number is a number. A score broken into five orthogonal dimensions is a diagnosis. Gate scores agents across five dimensions, and the policy tier is derived from all five — not from a single rolled-up value.

Identity confidence

How well-established is this agent’s identity? Is it verified? Does it have an attested human owner? Has it operated long enough to build a track record?

Risk

Has this agent been involved in security incidents, tool misuse, policy violations, or PII exposure? Risk rises when incidents happen and decays slowly over time.

Reliability

Does this agent complete tasks successfully? Does it handle errors gracefully? Does it escalate when appropriate? Reliability is earned through consistent good behavior.

Autonomy

How much independence should this agent be granted? Reliable months earn more autonomy. A failed security test trims it.

Calibration

Does this agent know what it doesn’t know? Calibration compares the confidence an agent reports on a task against the outcome it actually got, scored with a rolling Brier metric. A confidently-wrong agent and a well-calibrated one can post the same reliability number — calibration is what tells them apart.

For why one composite score genuinely doesn’t do the job at production scale, the deeper argument lives in Identity, Risk, Reliability, Autonomy: Why One Trust Score Isn’t Enough for Production Agents. Calibration joined the model as a fifth dimension in 2026-Q3 — the reasoning behind the addition is in The Fifth Dimension: Measuring Whether an Agent Knew What It Didn’t Know.

The 24-event taxonomy

Score changes are driven by events. Gate uses a standardized 24-event taxonomy that every customer’s ingestion stream maps into, regardless of which framework — LangChain, CrewAI, AutoGen, custom — the agent runs on. The taxonomy is grouped by dimension:

Identity events — verification, owner attestation, manifest changes, delegation grants and revocations.
Risk events — PII exposure, prompt injection detections, tool misuse, policy violations, guard findings.
Reliability events — task completion, error rate spikes, escalation behavior, fallback chain hits.
Autonomy events — boundary tests, scope adherence, privilege escalation attempts, human-review approvals.
Calibration events— an agent’s reported confidence on a task (agent.confidence_reported) paired with the task’s observed outcome (agent.task_outcome).

Custom event types can be added via the API. Legacy event names (including the agentgate_*namespace from the platform’s earlier life) are mapped into the taxonomy automatically.

A scoring example in production

Consider an AI customer support agent handling email for an e-commerce company.

Day 1. Identity score low (no history), risk neutral, reliability unknown. Policy tier: review. Every customer-facing response is human-approved.
Week 2. 500 successful interactions in. Identity strengthens, reliability climbs. Policy tier upgrades to allow for standard responses, review for refunds over $100.
Month 2. The agent accidentally sends order details to the wrong email address. Guard flags PII in an unauthorized context. Risk spikes. Policy tier drops back to review for all interactions pending investigation.
Month 3.Root cause fixed. The agent resumes normal operation. Risk decays. Trust is rebuilt through behavior, not by an admin clicking “approve.”

The agent’s permissions changed four times in three months without anyone editing a permissions matrix. That’s the mechanism.

Scoring profiles per industry

A customer support chatbot and a medical triage agent should not share trust rules. Scoring profiles let a tenant override the engine’s defaults — different dimension weights, different policy-tier thresholds, different event sensitivities — per vertical. Eleven presets ship today, seven of them calibration-aware verticals (healthcare, financial services, legal, software, security, e-commerce, and a calibration-aware general profile); custom profiles are supported on Pro and higher.

The case for why one threshold cannot survive both healthcare and e-commerce traffic is detailed in One Size Doesn’t Fit All: Configuring Trust Thresholds for Healthcare vs. E-Commerce Agents.

Trust scoring vs. LLM evaluation

These are often conflated and they shouldn’t be. LLM evaluation grades a model on a benchmark, offline, before deployment. Agent trust scoring grades a deployed agent on its production behavior, continuously, after deployment. The eval suite tells you the model can answer a BoolQ question; the trust score tells you the agent didn’t leak a customer’s SSN on Tuesday. Both have value. They are not substitutes.

The full delineation — Agent Scoring Is Not LLM Evaluation. Here’s the Difference.

Trust score vs. identity vs. policy decision

A trust score is not an identity. An identity is not a policy decision. Conflating any pair of these is how AI agent governance stacks fail in production. Identity answers who is this agent. Trust scoring answers how is this agent behaving. A policy decision combines both — plus the action being attempted, the tier the agent currently sits in, and any kill-switch or delegation overrides — and emits a single allow / review / deny per request.

Why traditional IAM, on its own, doesn’t close the loop: Identity Is Not Trust: Why Verified Agents Still Need Scoring.

Frequently asked questions

Is a trust score the same as an accuracy score?

No. Accuracy measures whether a model produced the right output on a benchmark. A trust score measures whether an agent in production is behaving in a way that earns continued access — across identity strength, risk exposure, reliability history, and the autonomy it should be granted. The same agent can score 94% on a benchmark and lose autonomy in production the same week, because the two questions are different.

What kinds of events affect an agent's trust score?

Gate's event taxonomy covers 24 standardized event types across five families: identity events (verification, owner attestation, manifest changes), risk events (PII exposure, prompt injection, tool misuse, policy violations), reliability events (task completion, error rates, escalation behavior), autonomy events (boundary tests, scope adherence, privilege requests), and calibration events (an agent's reported confidence on a task paired with the task's observed outcome). Custom event types can be added via the API; legacy event names are mapped automatically into the taxonomy.

Can I customize how the five dimensions are weighted?

Yes — that's what scoring profiles are. A profile is a per-tenant configuration that overrides the engine's default weights for each dimension, defines thresholds for the allow / review / deny policy tiers, and can be scoped per industry vertical. VeriSwarm ships eleven preset profiles — seven of them calibration-aware verticals (healthcare, financial services, legal, software, security, e-commerce, and a calibration-aware general profile) — and supports fully custom profiles.

How does the kill switch interact with trust scoring?

The kill switch is a hard override at the decision layer, not a score modifier. When an agent is killed, every decision check against that agent returns deny with reason_code: "agent_killed" and policy_tier: "tier_x", regardless of what the underlying scores say. Killing an agent doesn't damage its score history — it's a separate, audit-logged operator action that can be reversed.

What's included in the free tier?

Gate's free tier includes the full five-dimension scoring engine, unlimited event ingestion, 5,000 trust decisions per day, the default scoring profile, and the same hash-chained audit ledger Vault uses for paid plans. Custom scoring profiles, the shared-reputation network, and the broader Guard / Passport / Vault / Cortex pillars are plan-gated, but trust scoring itself starts free.

What is Calibration Trust and why is it a 5th dimension?

Calibration Trust measures whether an agent's confidence matches reality. When an agent reports how confident it is on a task, Gate pairs that prediction against the task's observed outcome and scores the gap with a rolling Brier metric — the same proper scoring rule used to grade weather forecasts. It's a distinct fifth dimension because confidence-accuracy is orthogonal to the other four: an agent can be highly reliable yet chronically overconfident, and that overconfidence is exactly the failure mode that turns an autonomous decision into an incident. Calibration complements the existing four dimensions; it doesn't replace any of them.

Try trust scoring on your own agents

Gate’s free tier includes the full scoring engine, unlimited event ingestion, 5,000 decisions per day, and the same hash-chained ledger every paid plan uses. Ten minutes to wire up; the agent framework you’re already on works.

Try the demo Start free