Every authorization system you've built in your career has had two verdicts. Access control, firewalls, auth middleware, payment fraud checks - they all resolve to allow or deny. A request either gets through or it doesn't.

For autonomous AI agents, this model falls apart almost immediately. The reason is simple, and once you see it you can't unsee it: the interesting cases are the ones in the middle.

An agent buys 90% of the same things day after day. Those should allow automatically. A tiny fraction of requests are obviously malicious or broken. Those should block automatically. But there's a huge middle band - requests that are unusual, or close to a limit, or involve a new counterparty, or were triggered by an ambiguous instruction - where neither allow nor deny is the right answer. Allowing is reckless. Denying kills autonomy.

You need a third option: escalate.

WHAT ESCALATE ACTUALLY MEANS

ESCALATE is not a soft deny. It's not a deferred allow. It's a distinct verdict that produces a specific behavior:

1The agent stops the current action.

2Structured context about the request is sent to a human (or higher-authority system).

3The agent waits for a decision.

4When the decision arrives, the agent resumes - executing, declining, or adjusting based on what came back.

The critical property: the agent can continue doing other work while one action is escalated. It's not a hard pause on the whole agent, just on the one transaction that needed a second opinion.

WHY BINARY SYSTEMS BREAK

Binary systems force you to pick a side of a tradeoff that doesn't have a right answer.

Strict side

"Block anything above $50." Now your agent can't buy a $51 tool it genuinely needs. It fails its task. You loosen the cap. The incident it was designed to prevent happens two weeks later at $49.

Loose side

"Allow anything under $500." Now every prompt injection, misparsed address, and buggy loop is a $499 mistake waiting to happen.

No amount of tuning fixes this. The problem isn't the threshold, it's the shape of the decision - a continuous risk surface being forced through a binary gate.

ESCALATE widens the gate. "Block above $500, allow below $50, escalate everything in between." The grey zone gets human attention without grinding the agent to a halt.

WHAT GOOD ESCALATE LOOKS LIKE IN PRACTICE

The ESCALATE verdict is only useful if the human on the other end can actually act on it. That means the verdict needs to carry enough context to decide quickly.

A good escalation payload includes:

▸The transaction itself - amount, currency, recipient, source
▸The triggering rule - which check fired and why
▸Reason codes - machine-readable identifiers like ABOVE_ESCALATION_THRESHOLD, NEW_RECIPIENT, OUTSIDE_HOURS
▸Agent context - what task was the agent doing when this came up
▸Historical comparison - similar transactions this agent has made, grouped by outcome
▸A one-click approve / decline - no forms, no multi-step wizards

If a human can't make a decision in under 30 seconds, the escalation will pile up and become noise. If they can, escalations become a high-signal stream of exactly the edge cases that teach you how to tune your policy.

ESCALATION AS INSTRUMENTATION

Here's the second-order effect most people miss: your escalation log is the best training data you'll ever have for your policy.

Every escalation is a case where your rules weren't confident enough to decide automatically. Each one tells you:

Was this a legitimate case your policy should have allowed?

Tighten the escalation threshold for this pattern.

Was this a catch that should have blocked outright?

Add a hard block rule.

Was this genuinely ambiguous and the human took 45 minutes to decide?

Your rules can't capture this, and you need a different escalation path or a higher-level reviewer.

After a week of running an agent in production, your escalation log becomes a prioritized list of exactly the policy changes you should make next. Binary systems have no equivalent - a block is a block, and you never learn whether it was correct.

IMPLEMENTING THREE VERDICTS

The SDK shape is straightforward. Here's what xBPP looks like:

import { evaluate } from '@vanar/xbpp'
import policy from './policies/balanced.json'

const verdict = evaluate(
  { amount: 250, currency: 'USDC', recipient: '0xabc...' },
  policy
)

// verdict.decision is one of 'ALLOW' | 'BLOCK' | 'ESCALATE'
// verdict.reasons is an array of reason codes
// verdict.message is a human-readable summary

Handling the three cases at the agent level:

switch (verdict.decision) {
  case 'ALLOW':
    return await execute(tx)

  case 'BLOCK':
    log.policyBlock(verdict)
    return { status: 'blocked', reasons: verdict.reasons }

  case 'ESCALATE':
    const decision = await humanApproval.request({
      transaction: tx,
      verdict,
      context: currentTaskContext()
    })
    if (decision === 'approved') return await execute(tx)
    return { status: 'declined_by_human' }
}

Three cases, three clear behaviors, one explicit escalation channel. That's the whole pattern.

WHEN NOT TO ESCALATE

The third verdict is powerful but not free - every escalation consumes human attention, and over-escalating kills the pattern's value. Some guidance:

Don't escalate known-good patterns. If an agent has paid the same recipient 500 times successfully, don't escalate the 501st.

Don't escalate minor amounts. If your escalation threshold is $5, humans will ignore it. Set thresholds where human judgment adds real value.

Don't escalate what should be blocked. If a request is obviously out of bounds (currency not allowed, chain not allowed), block it - don't make humans rubber-stamp deny decisions.

Do escalate novelty. First-time recipients, unusual patterns, ambiguous categories - these are where humans actually add value.

The goal is for every escalation to be a decision a human would be glad they saw.

THE BIGGER IDEA

Agentic systems are pushing us to redesign a lot of infrastructure that was built for a human-in-the-loop world. Authorization is one of the biggest. The binary model assumed there was always a human about to click "submit", so rejecting a borderline request just meant the human would try again or ask someone. Neither of those fallbacks exists for a fully autonomous agent.

The three-verdict model is a small but important reframe: autonomy and oversight aren't opposed, they're composed. The agent gets to run freely inside a defined allow zone, refuses anything in the block zone, and invites a human in for the middle band - automatically, with structured context, without blocking the rest of its work. That's how you ship autonomous agents without losing sleep.

ALLOW, BLOCK, ESCALATE - WHY AUTONOMOUS AGENTS NEED THREE VERDICTS