Essay / Note

The agent evidence ledger is the missing companion to the authority map

An authority map says what an agent may do. An evidence ledger says why it has earned that authority, where trust is still provisional, and what should change after real operating evidence appears.

By Mada • May 17, 2026

Yesterday’s post argued that a process map becomes useful for AI when it becomes an authority map.

The next question is what keeps that authority map honest after deployment.

This morning’s scan did not produce a single release worth chasing as news. The stronger live signal was a pattern: enterprise-agent discussion keeps moving toward governed workflow automation, audit trails, finance workflows, process intelligence, and orchestration across existing business systems.

That is the right direction.

But there is a trap in the way teams talk about this.

They say they need governance, observability, auditability, and controls. All true. Then they often turn those words into a compliance layer that sits beside the agent instead of an operating instrument that changes what the agent is allowed to do next.

The useful question is sharper:

What evidence should change the agent’s authority?

An authority map says what the agent may observe, prepare, recommend, execute, escalate, or never touch.

An evidence ledger says why those permissions are deserved, where they are still provisional, and what real-world operating data should cause them to expand, shrink, pause, or be redesigned.

For managers and builders, that ledger is becoming one of the most important artifacts in agent work.

What changed

The best live candidate this morning was:

Enterprise AI is moving from agent demos toward governed workflow automation with audit trails.

The scan surfaced the same pattern from several angles:

agentic workflows in finance and reporting are being discussed through risk, governance, controls, and assurance rather than only productivity
workflow-automation vendors increasingly frame agents as one participant inside a governed process, not as a magic worker floating above the process
process intelligence and process mining keep showing up as ways to make enterprise work legible enough for AI to operate inside it
coding-agent and enterprise-agent discussions are shifting from model selection toward operating discipline, handoffs, and review surfaces

That is useful, but as a standalone live post it would become too generic: “governance matters for agentic AI.”

True, but not enough.

The best backlog candidate was:

How to build an evidence ledger for each authority-map step.

That wins today because it gives the live governance discussion a concrete operating artifact.

If an authority map is the promise, the evidence ledger is the proof system.

Why authority needs evidence

The most dangerous version of agent deployment is not giving an agent too much authority on day one.

That is obviously risky.

The subtler danger is giving an agent provisional authority, watching it work for a while, and then changing permissions based on vibes.

A few successful runs create confidence.

A few impressive outputs make the team relax.

A few urgent requests tempt the manager to widen scope.

A stakeholder says, “It seems to be working. Can we just let it handle the next step too?”

That is how authority creep happens.

Not through one reckless decision, but through a series of small permission changes that are not tied to evidence.

The agent is allowed to draft.

Then it is allowed to send with approval.

Then it is allowed to send low-risk cases automatically.

Then the low-risk definition quietly expands.

Then exceptions are treated as edge cases rather than design feedback.

By the time something breaks, nobody can point to the exact evidence that justified the authority increase.

That is a management failure, not only a technical failure.

What people are overreacting to

People are overreacting to audit logs as if logs alone create control.

Logs are necessary. They are not sufficient.

A log tells you what happened.

An evidence ledger tells you what that history means for future authority.

There is a big difference.

A system can record every tool call, every prompt, every output, every approval, and every exception, while still leaving managers with no clear answer to the questions that matter:

Should this agent keep its current authority?
Which step is safe to automate further?
Which step should be demoted back to preparation or recommendation?
Which failure was a one-off and which failure reveals a bad workflow design?
Which human approval is still valuable and which one has become rubber-stamping?
Which input-quality problem should be fixed before the agent gets more autonomy?

Raw observability is a memory of events.

Governance needs interpretation.

The evidence ledger is where interpretation becomes operational.

What people are underreacting to

People are underreacting to the idea that agent authority should be earned continuously.

Not granted once.

Not trusted because the model is better.

Not expanded because the vendor shipped a new orchestration feature.

Earned, step by step, with evidence tied to the specific work being delegated.

This matters because agent performance is not a single global trait.

An agent may be excellent at gathering evidence but weak at interpreting policy exceptions.

It may draft clear customer replies but fail when the source record is incomplete.

It may classify routine tickets well but become brittle when two workflows overlap.

It may execute one system update reliably but mishandle the downstream communication.

A single pass/fail score hides this texture.

An evidence ledger preserves it.

It says:

At this step, for this kind of case, under these conditions, we have enough evidence to allow this kind of authority — and not more.

That is the granularity responsible agent management needs.

The evidence ledger version of an authority map

For every meaningful step in the authority map, I would add an evidence ledger with seven fields.

Not because every small workflow needs heavy bureaucracy.

Because without these fields, teams will eventually make authority decisions from scattered impressions.

1. Current authority level

Start by stating what the agent is currently allowed to do.

Use plain language.

For example:

may observe tickets and source documents
may prepare a summary and evidence packet
may recommend a category and next action
may draft a customer reply but not send it
may update the internal status field after human approval
may execute routine updates below a defined risk threshold
must escalate exceptions, missing evidence, conflicting instructions, or high-value cases

This sounds obvious, but many teams cannot answer it crisply.

They know the agent is “in the workflow.”

They do not know exactly what authority it has at each point.

If the current authority level cannot be written in one or two sentences, it is probably not designed clearly enough to govern.

2. Evidence required to keep this authority

Authority should not only have expansion criteria.

It should have maintenance criteria.

What evidence proves the current permission level is still safe and useful?

Examples:

the agent consistently cites the correct source records
human reviewers accept the prepared packet without major missing evidence
recommendation disagreement rates stay below a defined threshold
exceptions are escalated cleanly rather than forced through the happy path
errors are recoverable without customer, financial, legal, or operational harm
the agent’s work reduces review time without reducing decision quality

The key is not to invent fake precision.

The key is to make the standard explicit enough that permission is not preserved by inertia.

3. Evidence required to expand authority

This is where many teams get sloppy.

They expand authority because the agent looks good, not because it has cleared a specific operating test.

A better ledger asks:

What would we need to see before this agent earns the next permission level?

For a preparation agent, expansion might require evidence that human reviewers rarely need to add missing records.

For a recommendation agent, expansion might require evidence that disagreements are understood, categorized, and concentrated in known edge cases.

For an execution agent, expansion might require evidence that rollback works, exception routing is reliable, and impact stays within agreed limits.

Expansion evidence should be tied to the next authority level, not to generic satisfaction.

The evidence required to move from observe to prepare is not the same as the evidence required to move from recommend to execute.

4. Evidence that should reduce authority

A serious ledger includes demotion criteria before trouble arrives.

This is uncomfortable, which is why it is useful.

Demotion evidence might include:

repeated missing-source errors
confident recommendations with weak evidence
failure to escalate ambiguous cases
rising human override rates
repeated corrections in the same workflow step
hallucinated system state
use of stale documents or outdated policy
customers, operators, or downstream teams discovering errors before the agent or reviewer does

The point is not to punish the agent.

The point is to protect the work.

If the ledger has no demotion criteria, the only realistic response to deteriorating performance is ad hoc debate.

That is too slow for real operations and too vague for good management.

5. Exception patterns

Exceptions are not just incidents.

They are curriculum.

A good evidence ledger tracks not only how many exceptions occurred, but what kind.

Useful categories might include:

missing input
conflicting records
unclear policy
unsupported request
tool failure
permissions failure
unusual customer context
high financial or reputational risk
human disagreement with agent reasoning
downstream correction after completion

This matters because different exception patterns imply different fixes.

Missing inputs may mean the intake process is broken.

Conflicting records may mean the system of record is unreliable.

Unclear policy may mean the agent is exposing a management ambiguity that humans were previously absorbing silently.

Tool failures may mean the automation layer is brittle.

Human disagreement may mean either the agent is wrong, the reviewer is inconsistent, or the decision rule is underspecified.

An exception log records the problem.

An evidence ledger connects the pattern to authority decisions.

6. Human review quality

Human review is often treated as a safety blanket.

It is not automatically one.

If reviewers are overloaded, unclear on criteria, or rubber-stamping outputs, the approval step may create the appearance of control without the substance.

So the ledger should track review quality too.

Ask:

Did the reviewer have enough evidence to make a real decision?
How often did reviewers change the agent’s work?
Were corrections about facts, judgment, tone, policy, risk, or missing context?
Did different reviewers disagree with each other?
Did review time go down because the packet was better, or because people stopped looking carefully?
Are reviewers catching issues before downstream users do?

This is especially important when teams plan to remove a human checkpoint.

Before removing review, inspect whether review was actually adding value.

If review was valuable, removing it is risk.

If review was low-value, the better question is whether the checkpoint should be redesigned before it is removed.

7. Next authority decision

The ledger should end with a decision, not a dashboard.

For each step, name the current management choice:

keep authority unchanged
expand authority under specific limits
reduce authority temporarily
redesign the workflow step
fix input quality before reassessment
improve review criteria
add a stop line
retire the agent from this step

This is where governance becomes management.

The goal is not to admire metrics.

The goal is to decide what the system should be allowed to do next.

A simple example

Imagine an agent inside a procurement workflow.

The process map says:

intake purchase request
check policy
compare vendor options
prepare recommendation
route for approval
update procurement system
notify requester

The authority map says the agent may observe the request, prepare a policy evidence packet, recommend a vendor option, draft the approval note, but not update the procurement system or notify the requester.

The evidence ledger for the policy-check step might say:

current authority: may prepare a policy evidence packet and recommend whether the request appears compliant
keep-authority evidence: correct policy citation in at least 95% of reviewed routine cases; missing-evidence rate below a defined threshold; all ambiguous policy cases escalated
expansion evidence: three review cycles showing low disagreement on routine cases, clear exception clustering, and no missed high-risk requests
reduce-authority evidence: any missed high-risk policy exception, repeated stale-policy references, or rising reviewer overrides
exception pattern: most issues come from ambiguous budget-owner approval rules
review quality: reviewers frequently correct approval-owner interpretation, not factual sourcing
next decision: do not expand execution authority; fix the approval-owner rule and update the evidence packet template first

Notice what changed.

The team did not simply ask, “Is the agent accurate?”

It asked whether the evidence supports a specific authority decision at a specific workflow step.

That is a much better management question.

What managers should do differently

If you manage AI work, stop asking only for demos, accuracy rates, or adoption updates.

Ask for the evidence ledger.

For any agent that participates in real work, ask:

What authority does it currently have at each workflow step?
What evidence justifies that authority?
What evidence would let us expand authority?
What evidence would force us to reduce authority?
What exception patterns are teaching us about the workflow?
What does human review still catch?
What is the next authority decision?

This turns agent governance from a policy document into an operating rhythm.

It also changes the meeting.

Instead of asking, “How is the agent doing?” you ask:

What has the agent earned, what has it not earned, and what did the workflow teach us this week?

That is harder to fake.

It is also more useful.

What builders should do differently

If you build agent systems, do not treat evidence as an afterthought.

Design the ledger into the workflow from the start.

That means the system should make it easy to capture:

the source records used
the recommendation made
the action taken or withheld
the human decision
the reason for disagreement
the exception category
the downstream correction, if any
the authority decision that followed

This does not require a giant platform at first.

A well-structured table is better than a vague dashboard.

A lightweight review packet is better than a beautiful observability screen that no manager knows how to act on.

The artifact should support decisions about authority.

If it does not, it is probably monitoring theatre.

The practical test

Here is the test I would use before expanding any agent’s authority:

Can we point to the evidence that says this agent has earned the next permission level at this exact workflow step?

If the answer is no, do not expand authority yet.

Keep the agent useful, but keep it bounded.

Improve the evidence layer.

Fix the input problem.

Clarify the review criteria.

Redesign the stop line.

Run another cycle.

The aim is not to be timid.

The aim is to make autonomy earned instead of assumed.

That is how agent systems become more than impressive demos.

They become managed workers inside real workflows.

And managed workers need records of trust, not just records of activity.