The integration of generative artificial intelligence models into distributed computing environments introduces a significant architectural mismatch: the interface between probabilistic, non-deterministic decision engines and rigid, deterministic state machines. In agentic workflows, this mismatch manifests as a systemic vulnerability at the point of action execution. Historically, distributed systems have relied on transport-level or database-level execution acknowledgments (such as an HTTP 200 OK or a write-ahead log commit) as proof of successful state mutation. However, in high-dimensional agentic architectures, an action cannot be deemed complete when the model asserts its completion, or even when the invoking tool returns a successful execution response.
An action is not complete when the model says it is complete, or even when the tool returns success. An action is complete only when the system verifies the resulting state against the intended outcome and reconciles any discrepancy.
Action verification functions as the central truth-management layer of tool-using AI architectures. It bridges the gap between what an agent believes it has accomplished and what has actually changed within authoritative systems. The fundamental question is not “Did the tool return a success response?” but “Did the intended state change actually occur, in the correct system, to the correct object, under the correct tenant and permission scope, with the expected side effects, no hidden duplicates, no unresolved partial failures, and an auditable recovery path if anything diverged?”. Without this verification layer, probabilistic agents operate on ungrounded assumptions, leading to cascade failures, duplicate mutations, or silent data corruptions. This report establishes the architectural specification for verifying planning, execution, observation, and recovery in enterprise-grade agentic environments.
This report closes Volume 5: Agentic Systems and Tool-Using Architectures, whose primary concern is how static generative models are transformed into autonomous actors, and how to make those actions bounded, contract-governed, verified, and recoverable.
To maintain structural durability within the AI Engineering Systems Canon, this report establishes precise interfaces and clear boundaries with preceding and succeeding modules:
+---------------------------------------------------------------------------------+
| Volume 5 Architectural Boundaries |
+---------------------------------------------------------------------------------+
| AI-ENG-M: Agentic Orchestration |
| - Decision-making loop |
| - Plan generation and sub-goal management |
| - Autonomy boundaries and loop budgets |
+---------------------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------------------+
| AI-ENG-N: Tool Contracts |
| - Parameter validation and deterministic wrappers |
| - Security mapping and credentials injection |
| - Idempotency key generation and confirmation gates |
+---------------------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------------------+
| AI-ENG-O: Action Verification (Truth-Management Layer) |
| - Post-execution state verification and readbacks |
| - Discrepancy classification and state reconciliation |
| - Rollback, compensation sagas, and forward recovery |
+---------------------------------------------------------------------------------+
Furthermore, this report inherits and extends doctrines from across the broader canon:
To establish a uniform vocabulary for platform architects, agent engineers, and site reliability engineers (SREs), the fundamental terms of action verification are defined below.
| Term | Technical Definition | Operational Implication |
|---|---|---|
| Action Verification | The post-execution discipline that confirms whether a tool mutation produced the correct physical state change. | Prevents the agent from updating its task state or claiming success until state changes are proven. |
| Intended State | The desired final state of the target system, derived from the user goal or planning parameters. | Serves as the reference baseline against which actual observations are evaluated. |
| Requested State | The specific payload values generated by the model and validated by deterministic tool wrappers. | Represents the operational translation of intent into formal API parameters. |
| Observed State | The initial data returned within the tool execution payload or Model Context Protocol (MCP) observation object. | Treated as unverified evidence; cannot modify the believed task state. |
| Authoritative State | The ground-truth state retrieved directly from the primary source of record, bypassing replicas and caches. | The final baseline used to confirm execution success. |
| Believed Task State | The internal state representation maintained in the agent’s orchestrator memory or context window. | Read-only; updated only after authoritative state reconciliation is complete. |
| Hallucinated Success | An execution failure where an agent treats an incomplete, failed, or simulated action as fully completed. | Corrupts subsequent planning, audit logs, and user-facing status reports. |
| Assumed Completion | A design flaw where an orchestrator treats a successful request dispatch as proof of a committed mutation. | Leads to cascading failures when downstream steps depend on uncommitted state. |
| Observation Object | A normalized schema returning execution outputs, isolating the model from raw stack traces. | Simplifies syntax handling but still requires verification appropriate to its action class. |
| Reconciliation | The algorithmic evaluation of the verified actual state against the intended state to resolve discrepancies. | Determines whether the action succeeded, failed, or requires recovery. |
| Read-After-Write Check | An active query issued to confirm a mutation, designed to bypass replication lag. | Prevents stale reads from causing false-negative verification errors. |
| Partial Failure | A state where a multi-step or asynchronous action completes only a subset of its intended mutations. | Must be represented explicitly rather than being flattened into a binary success or failure. |
| Rollback | An operation that restores a system to its prior state within a single transactional boundary. | Preferred when available; aborts the active database transaction or restores a local file. |
| Compensation | An independent, idempotent transaction that logically reverses the visible effects of a committed transaction. | Used in distributed sagas when atomic rollbacks are technically impossible. |
| Forward Recovery | An SRE pattern that continues execution from a partial failure state toward the target goal. | Used when mutations are irreversible or when completing the remaining steps is preferred. |
| Action Ledger | An audit-grade, append-only log recording every phase of action execution, verification, and recovery. | Used for compliance, security forensics, and agent regression testing. |
| Verification Depth | The level of verification checks required for an action, scaling with its side-effect class. | Balances system latency against operational and security risks. |
| Recovery Policy | A deterministic set of rules mapping specific state discrepancies directly to defined recovery actions. | Eliminates unsupervised language model decision-making during failure recovery. |
The architectural foundation of action verification is post-execution truth management. Agentic systems must distinguish between what was planned, what was proposed, what was validated, what was attempted, what was observed, what was verified, what was reconciled, and what may safely be reported.
The lifecycle is:
+--------------------------------------------------------------------------------
| POST-EXECUTION TRUTH LIFECYCLE
+--------------------------------------------------------------------------------
|
| [ Planned ]
| |
| v
| [ Proposed ]
| |
| v
| [ Validated ]
| |
| v
| [ Executed / Attempted ]
| |
| v
| [ Observed ]
| |
| v
| [ Verified ]
| |
| v
| [ Reconciled ]
| |
| v
| [ Reported ]
|
+--------------------------------------------------------------------------------
| Rule:
| User-facing success may not outrun reconciliation.
| Agent task state may not outrun verification.
| Model belief may not outrun authoritative state.
+--------------------------------------------------------------------------------
| Transition | Failure Risk | Verification Requirement |
|---|---|---|
| Planned -> Proposed | The plan may be stale, underspecified, unauthorized, or semantically wrong. | Proposal must be derived from current task state and active autonomy boundary. |
| Proposed -> Validated | Model-generated arguments may be malformed, missing fields, unauthorized, or unsafe. | Tool contract validation must check syntax, schema, semantics, permissions, policy, budget, and confirmation state. |
| Validated -> Executed | A valid request can still fail due to timeouts, locks, dependency outages, rate limits, or worker crashes. | Execution attempt must be logged with idempotency key, trace ID, timeout policy, and wrapper version. |
| Executed -> Observed | A tool may return success, partial success, timeout, accepted, pending, or ambiguous result. | Observation must be normalized into a typed object and treated as evidence, not truth. |
| Observed -> Verified | The observation may not reflect durable authoritative state. | Verification must query the correct source of record or approved verification endpoint. |
| Verified -> Reconciled | Verified state may differ from intended outcome, requested payload, tenant scope, or side-effect expectations. | Reconciliation must evaluate predicates over authoritative state and classify discrepancies. |
| Reconciled -> Reported | User-facing status may falsely claim completion, or the agent may continue from a false belief. | Reporting and next-step planning must be grounded only in reconciled state. |
Tool observations are structured evidence. They are not automatically authoritative truth.
For example:
accepted, but delivery may later bounce, quarantine, or fail.authorized, but capture or settlement may remain pending.The orchestrator must therefore treat every observation as an intermediate state until verification determines what actually happened.
ObservationObject.status = evidence
AuthoritativeReadback = verification source
ReconciliationResult = task-state update authority
A system may report:
| Verification State | Allowed User-Facing Status |
|---|---|
| No execution attempt | “The action has not been executed.” |
| Execution attempted, no observation | “The action was attempted; result is unknown.” |
| Observation received, not verified | “The action response was received; verification is pending.” |
| Verified but not reconciled | “The system verified current state; reconciliation is in progress.” |
| Reconciled success | “The action completed successfully.” |
| Reconciled partial success | “The action partially completed; these parts remain unresolved.” |
| Reconciled failure | “The action failed; no verified completion occurred.” |
| Unknown / unverifiable | “The final state is unknown and requires review.” |
The model must never convert an accepted, pending, timeout, or unverified observation into a completed user-facing success statement.
Hallucinated success is the central execution pathology of tool-using agentic systems. It occurs when a model, orchestrator, wrapper, or user-facing interface treats an intended, attempted, pending, failed, partial, or unverifiable action as successfully completed.
This failure does not always originate in model text generation. It is often an architectural failure: the system failed to separate execution from verification.
| Pattern Code | Pathology | Root Cause | Programmatic Mitigation |
|---|---|---|---|
| HS-01 | Phantom Execution | The model claims a tool ran when no physical invocation occurred. | Require orchestrator-owned tool traces for every claimed action. |
| HS-02 | Validation Error Collapse | A validation failure is summarized as a completed action. | Validation failure blocks execution and returns a typed error state. |
| HS-03 | Accepted-as-Completed | An asynchronous accepted or queued status is treated as final. |
Require polling, webhook, readback, or verification endpoint before completion. |
| HS-04 | Flattened Partial Success | A multi-step action succeeds in one subsystem and fails in another, but is reported as complete. | Represent sub-action statuses explicitly and reconcile each required side effect. |
| HS-05 | Timeout Blindspot | A timeout is assumed to mean failure or success without checking durable state. | Use idempotency keys and verification readback before retrying or reporting. |
| HS-06 | Uncontrolled Retry Mutation | Retrying a mutating action creates duplicate side effects. | Enforce idempotency keys, duplicate detection, and payload-hash binding. |
| HS-07 | Stale-Replica Confirmation | Verification reads from a lagging replica and confirms old state. | Use writer/primary reads, commit tokens, LSN checks, or consistency-bound read APIs. |
| HS-08 | Stale-Cache Read | Verification hits an uninvalidated cache and sees pre-action state. | Bypass or invalidate caches during post-action verification. |
| HS-09 | Premature Propagation Success | A tool returns success before downstream propagation completes. | Track pending states and poll until terminal authoritative state is reached. |
| HS-10 | Error Summarization Collapse | A model interprets raw error text as successful completion. | Normalize tool errors into typed observation objects with explicit success flags. |
| HS-11 | Outrun Status Reporting | The UI or agent reports completion before reconciliation finishes. | Lock user-facing success until the action ledger records reconciled success. |
| HS-12 | Wrong Target Success | The action succeeds on the wrong account, tenant, file, region, or resource. | Verify target identity, tenant ID, resource ID, and permission scope during reconciliation. |
| HS-13 | Compensation Blindspot | A compensating action is assumed to have reversed an earlier mutation without verification. | Verify compensation results and record compensated terminal state. |
| HS-14 | Unknown State Flattening | An unverifiable or ambiguous result is collapsed into success or failure. | Preserve UNKNOWN / UNVERIFIABLE states and route to review or reconciliation. |
From an operational standpoint, hallucinated success corrupts the agent’s belief state. Once an agent proceeds from false success, subsequent plans can compound the error: sending follow-up messages, making duplicate charges, skipping required remediation, or writing poisoned memory.
A model statement is not completion.
A dispatch attempt is not completion.
An HTTP success code is not completion.
An accepted queue message is not completion.
A normalized observation is not completion.
Completion occurs only when authoritative state satisfies the required
verification predicates and the reconciled result is committed to task state.
The State Reconciliation Model compares the intended outcome, requested operation, observed response, authoritative state, and believed task state. Its purpose is not to demand raw object equality. Real systems add IDs, timestamps, status transitions, derived fields, audit metadata, version numbers, provider-specific fields, and asynchronous states.
Therefore, action verification should evaluate predicates over authoritative state, not simplistic equality between intended state and full system state.
+--------------------------------------------------------------------------------
| STATE RECONCILIATION MODEL
+--------------------------------------------------------------------------------
|
| [ Intended Outcome ]
| user goal, plan target, expected business result
| |
| v
| [ Requested Operation ]
| validated tool payload, target resource, tenant scope, idempotency key
| |
| v
| [ Observed Response ]
| normalized tool observation: success, accepted, pending, error, timeout
| |
| v
| [ Authoritative State ]
| source-of-record readback, ledger query, delivery status, deployment health
| |
| v
| [ Reconciliation Result ]
| success, partial, mismatch, pending, duplicate, compensated, unknown
| |
| v
| [ Believed Task State ]
| updated only from reconciliation result
|
+--------------------------------------------------------------------------------
| Rule:
| Authoritative state is evaluated against verification predicates.
| Believed task state is updated only after reconciliation.
+--------------------------------------------------------------------------------
Let:
| Symbol | Meaning |
|---|---|
SI |
Intended outcome. |
SR |
Requested operation. |
SO |
Observed tool response. |
SA |
Authoritative state. |
P |
Verification predicate set. |
SB |
Believed task state. |
RR |
Reconciliation result. |
A successful reconciliation is:
RR = SUCCESS if all p in P evaluate true over SA, SR, SI, and scope metadata.
Examples of verification predicates:
SA.resource_id == SR.resource_id
SA.tenant_id == SR.tenant_id
SA.status in allowed_terminal_success_states
SA.amount == SR.amount
SA.currency == SR.currency
SA.version > pre_action_version
SA.actor_id == expected_actor_or_service
SA.idempotency_key == SR.idempotency_key
SA.created_at >= action_execution_time
SA.audit_log contains trace_id
| State Layer | Source | Trust Level | Reconciliation Role |
|---|---|---|---|
Intended Outcome (SI) |
User goal, orchestrator plan, acceptance criteria. | Goal authority. | Defines what must become true. |
Requested Operation (SR) |
Validated tool payload. | Execution proposal authority. | Defines what was actually asked of the tool. |
Observed Response (SO) |
Tool response or observation object. | Evidence, not truth. | Triggers verification and provides hints. |
Authoritative State (SA) |
Source-of-record, primary ledger, provider verification endpoint, signed audit record. | Verification authority. | Determines what actually happened. |
Believed Task State (SB) |
Orchestrator state / agent memory. | Derived state. | May update only from reconciliation result. |
+--------------------------------------------------------------------------------
| OBSERVATION TRUST LADDER
+--------------------------------------------------------------------------------
|
| Level 0: Raw Tool Output
| stdout, stack traces, provider blobs, raw API errors
| Status: unsafe for direct model reasoning
|
| Level 1: Normalized Observation
| typed status, structured errors, trace ID, result payload
| Status: safe evidence, not verified truth
|
| Level 2: Verified Observation
| independent readback confirms that an event or resource exists
| Status: execution likely occurred
|
| Level 3: Reconciled State
| authoritative state satisfies required predicates
| Status: safe to update task state
|
| Level 4: Audit-Confirmed State
| reconciled state is bound to append-only ledger, approval records,
| idempotency record, policy version, and trace evidence
| Status: safe for regulated reporting and replay
|
+--------------------------------------------------------------------------------
When authoritative state does not satisfy the predicate set, the reconciliation engine classifies the discrepancy.
| Discrepancy | Definition | Typical Recovery |
|---|---|---|
| No-Op Success | Target was already in desired state before the action. | Mark idempotent success if permitted; record no-op. |
| No-Op Failure | Requested mutation produced no state change when change was required. | Refresh state, retry if safe, or escalate. |
| Value Mismatch | Target changed but fields do not match required predicates. | Repair, compensate, or manual review. |
| Stale State | Target version changed between planning and execution. | Refresh and replan. |
| Partial Application | Some but not all required sub-actions succeeded. | Compensate, forward recover, or escalate. |
| Duplicate Side Effect | More than one side effect exists for one logical operation. | Freeze, reconcile ledger, compensate if possible. |
| Wrong Target Modified | Action affected wrong resource, tenant, region, user, or account. | Freeze and trigger security/incident response. |
| Target Missing | Resource does not exist or was deleted concurrently. | Replan or fail with missing-target state. |
| Propagation Delay | Authoritative commit exists but dependent systems lag. | Keep pending and retry verification. |
| Unverifiable State | Target system exposes no reliable verification path. | Hold, escalate, or require human attestation. |
| Compensation Required | Committed state must be reversed logically. | Execute compensation tool and verify compensation. |
| Unknown State | System cannot determine whether action succeeded. | Preserve unknown state and block duplicate mutation until resolved. |
Verification depth should scale with side-effect class, data sensitivity, reversibility, cost, and user impact. Low-risk actions should not require heavyweight reconciliation. Critical mutations should never rely on a single transport success response.
| Side-Effect Class | Required Post-Action Check | Authoritative Verification Source | Fallback Behavior | Latency Tolerance | Human-Review Trigger |
|---|---|---|---|---|---|
| Read-Only Observation | Confirm request parameters, tenant scope, source identity, and response integrity. | Approved source endpoint, signed response, or permission-scoped read path. | Retry read, use alternate approved source, or return unavailable. | Low, usually synchronous. | Trigger if sensitive data boundary, tenant mismatch, or policy violation is detected. |
| Ephemeral Write | Verify sandbox path, file existence, size/hash, and no escape from sandbox. | Sandboxed filesystem monitor or workspace manifest. | Roll back local artifact or mark workspace dirty. | Low, synchronous. | Trigger on sandbox escape, unauthorized path, or destructive operation. |
| Low-Risk Internal Write | Read after write from source of record; verify row/object ID, tenant ID, version, actor, and changed fields. | Primary/writer database, authoritative service API, or strongly consistent read endpoint. | Retry verification; do not fall back to stale replica for final success. | Low to moderate. | Trigger after repeated verification mismatch or wrong-tenant detection. |
| Medium-Risk Operational Write | Verify before/after diff, workflow status, audit actor, and downstream queue state if applicable. | Application service API, primary database, workflow engine, or event ledger. | Keep pending, retry, compensate, or escalate based on discrepancy. | Moderate. | Trigger on partial success, stale state, or repeated mismatch. |
| High-Risk External Write | Verify provider transaction ID, delivery status, recipient/target, amount, idempotency key, and externally visible state. | Third-party ledger, provider verification API, gateway ledger, delivery status endpoint. | Hold pending; do not report success until verified. | Moderate to asynchronous. | Trigger on decline, mismatch, duplicate, unreachable provider, or verification timeout. |
| Critical Mutation | Multi-source reconciliation, approval verification, policy version check, audit ledger entry, and post-action verification. | Immutable ledger, security audit trail, consensus system, production controller, or regulated system of record. | Fail closed, hold, freeze affected resources, and require operator review. | Often asynchronous. | Always review before commit or immediately after if emergency break-glass policy was used. |
Read-only does not mean no-risk.
Low-risk write does not mean stale replica success.
External write does not mean provider-accepted success.
Critical mutation does not mean "retry and hope."
Verification strategy must be defined in the tool contract and recorded in the action ledger. If a tool has no verification path, the system must treat its result as unverifiable and constrain what the agent may claim or do next.
Post-action verification patterns vary by target system. Each pattern defines how the system verifies that the intended result occurred in the correct resource, tenant, region, and authority scope.
Verification should be specific enough that a downstream auditor can answer: what was changed, where, by whom, under which authority, and how the system proved it.
Distributed actions rarely fit a simple success/failure binary. A robust agentic architecture must represent intermediate, partial, pending, unknown, compensated, and review-required states explicitly.
+--------------------------------------------------------------------------------
| ACTION VERIFICATION STATE MACHINE
+--------------------------------------------------------------------------------
|
| [ Proposed ]
| |
| v
| [ Validated ]
| |
| v
| [ Executing ]
| |
| +--> transport/tool failure before side effect -----> [ Failed ]
| |
| +--> accepted but not terminal ----------------------> [ Accepted ]
| | |
| | v
| | [ Pending ]
| | |
| | +----------------+----------------+
| | | |
| | v v
| | [ Committed ] [ Verification Timeout ]
| | | |
| | v v
| | [ Reconciled Success ] [ Unknown / Review Required ]
| |
| +--> authoritative mismatch -------------------------> [ Reconciliation Failed ]
| | |
| | +-----------------------------+-----------------------------+
| | | |
| | v v
| | [ Compensating ] [ Forward Recovery ]
| | | |
| | +----------+----------+ +----------+----------+
| | | | | |
| | v v v v
| | [ Compensated ] [ Compensation Failed ] [ Reconciled Success ] [ Review Required ]
| |
| +--> multi-step partial commit -----------------------> [ Partially Committed ]
| |
| +-----------------------+-----------------------+
| | |
| v v
| [ Compensating ] [ Forward Recovery ]
|
| Terminal states:
| Reconciled Success
| Failed
| Compensated
| Rolled Back
| Review Required
| Abandoned
|
+--------------------------------------------------------------------------------
| State | Meaning | Allowed Transitions |
|---|---|---|
| Proposed | Model or planner generated an action proposal. | Validated, Failed. |
| Validated | Tool contract checks passed. | Executing, Failed, Review Required. |
| Executing | Tool call is in flight. | Accepted, Pending, Committed, Failed, Unknown. |
| Accepted | Target accepted request but has not completed processing. | Pending, Failed, Unknown. |
| Pending | Verification is polling, waiting for webhook, or awaiting eventual consistency. | Committed, Verification Timeout, Failed, Unknown. |
| Committed | A side effect appears to have committed in authoritative system. | Reconciled Success, Reconciliation Failed, Partially Committed. |
| Partially Committed | Some required sub-actions committed and others did not. | Compensating, Forward Recovery, Review Required. |
| Reconciled Success | Authoritative state satisfies verification predicates. | Terminal. |
| Reconciliation Failed | Authoritative state does not satisfy intended predicates. | Compensating, Forward Recovery, Review Required, Failed. |
| Failed | Execution failed before a committed side effect, or failure is terminal. | Terminal unless retry policy re-enters Proposed/Validated. |
| Verification Timeout | Verification did not complete within SLA. | Unknown, Review Required, Compensating, Forward Recovery. |
| Unknown | System cannot determine whether action committed. | Review Required, Pending, Reconciliation Failed, Reconciled Success. |
| Compensating | System is executing a reversing action. | Compensated, Compensation Failed, Review Required. |
| Compensated | Compensation verified successfully. | Terminal recovery state. |
| Compensation Failed | Reversing action failed or state remains inconsistent. | Review Required. |
| Forward Recovery | System is completing remaining steps after partial/pivot state. | Reconciled Success, Review Required, Failed. |
| Rolled Back | Local transaction was aborted before external visibility. | Terminal recovery state. |
| Review Required | Automated recovery is unsafe or insufficient. | Terminal until human/operator resolves or reopens. |
| Abandoned | Workflow stopped with unresolved discrepancy explicitly recorded. | Terminal failure state. |
Committed is not Partially Committed.
Partially Committed is not Success.
Accepted is not Committed.
Unknown is not Failure.
Compensated is not the same as Rolled Back.
The state machine must preserve these distinctions because each state implies different user messaging, retry permissions, compensation options, and audit obligations.
When verification detects a discrepancy, the system must choose a recovery path based on the transaction boundary, side-effect class, reversibility, idempotency posture, and business risk.
+--------------------------------------------------------------------------------
| RECOVERY STRATEGY MODEL
+--------------------------------------------------------------------------------
|
| [ Verification Discrepancy Detected ]
| |
| v
| [ Classify Transaction Boundary ]
| |
| +--> local atomic transaction open
| | |
| | v
| | [ Rollback ]
| | abort transaction | verify no visible side effect | ledger rolled_back
| |
| +--> compensable pre-pivot action
| | |
| | v
| | [ Backward Compensation ]
| | run compensation actions in reverse order | verify compensated state
| |
| +--> pivot committed / irreversible action
| | |
| | v
| | [ Forward Recovery ]
| | complete remaining idempotent steps | reconcile final state
| |
| +--> unknown or unverifiable high-impact state
| | |
| | v
| | [ Hold and Escalate ]
| | freeze workflow | preserve trace | route to operator
|
+--------------------------------------------------------------------------------
Rollback is preferred when the system still controls a local transaction boundary. It restores the prior state before intermediate changes become externally visible.
| Property | Rollback Behavior |
|---|---|
| Scope | Single database transaction, local filesystem transaction, isolated workspace mutation. |
| Visibility | Intermediate state should not be externally visible. |
| Trigger | Validation failure before commit, local write failure, transaction conflict. |
| Verification | Confirm transaction aborted and target state remains unchanged. |
| Terminal State | ROLLED_BACK. |
Compensation is used when a committed side effect cannot be rolled back atomically but can be logically reversed through another action.
| Property | Compensation Behavior |
|---|---|
| Scope | Distributed services, third-party APIs, microservice sagas, external records. |
| Visibility | Original action may have been visible to other systems. |
| Trigger | Partial failure before pivot, wrong value, canceled workflow, failed downstream step. |
| Verification | Confirm compensation action completed and resulting state is consistent. |
| Terminal State | COMPENSATED or COMPENSATION_FAILED. |
Compensation is not time travel. It creates a new event that reverses or offsets the prior event. Audits must show both.
Forward recovery is used when rollback or compensation is impossible, unsafe, legally inappropriate, or more harmful than completing the workflow.
| Property | Forward Recovery Behavior |
|---|---|
| Scope | Post-pivot workflows, settlement systems, deployment pipelines, message queues. |
| Visibility | Prior side effect is committed or externally visible. |
| Trigger | Pivot committed, pending settlement, partial post-pivot failure, eventual consistency delay. |
| Verification | Continue idempotent retryable steps until final predicates hold or review is required. |
| Terminal State | RECONCILED_SUCCESS or REVIEW_REQUIRED. |
The pivot transaction is the point of no return in a saga.
[ Compensable Step 1 ]
|
v
[ Compensable Step 2 ]
|
v
[ Pivot Transaction ]
|
v
[ Retryable Step 3 ]
|
v
[ Retryable Step 4 ]
|
v
[ Reconciled Success ]
| Saga Region | Recovery Strategy |
|---|---|
| Before pivot | Backward compensation is usually available. |
| At pivot | Approval, idempotency, and verification must be strongest. |
| After pivot | Forward recovery is preferred; remaining steps should be retryable and idempotent. |
| Recovery Strategy | State Isolation | Timing | Cost Overhead | Transaction Bounds | Best Fit |
|---|---|---|---|---|---|
| Atomic Rollback | Strong. Intermediate state not visible. | Synchronous. | Low. | Single transaction boundary. | Database transaction, local sandbox write. |
| Saga Compensation | Weak. Intermediate state may be visible. | Usually asynchronous. | Medium. | Distributed services and third-party APIs. | Refund, delete uploaded object, reverse workflow update. |
| Forward Recovery | Weak but goal-directed. | Asynchronous or retry-driven. | Low to medium. | Eventually consistent systems. | Settlement, deployment rollout, queue delivery. |
| Manual Review | Human-contained. | Asynchronous. | High. | Complex or ambiguous transactions. | Unknown state, failed compensation, high-risk mismatch. |
| Fail-Closed Hold | Freezes action path. | Immediate hold, asynchronous resolution. | High throughput impact. | Critical financial/security/infrastructure mutations. | Possible tenant leak, duplicate charge, unauthorized access. |
| Incident Creation | Operational containment. | After severe or repeated failure. | High. | Production-critical failures. | Compensation failure, wrong-target mutation, systemic verifier outage. |
Recovery decisions must be deterministic. The model may explain a discrepancy, but recovery policy should map discrepancy type, side-effect class, transaction boundary, and operational condition to a predefined action.
| Discrepancy Type | Verification Finding | Active Operational Condition | Target Recovery Action | Technical Rationale |
|---|---|---|---|---|
| Observation / Authority Mismatch | Tool returned success, but authoritative readback shows old state. | Recent write and consistency delay within threshold. | Retry Verification | The state may still be propagating; do not report success yet. |
| Observation / Authority Mismatch | Tool returned success, but authoritative readback shows old state. | Read may have hit replica, cache, or stale endpoint. | Force Primary / Strongly Consistent Read | Verification must use source-of-record or consistency-bound endpoint. |
| Observation / Authority Mismatch | Tool returned success, but authoritative state still mismatches after strong read. | Mutation is reversible or compensable. | Compensate or Roll Back | The action did not produce required state. |
| Observation / Authority Mismatch | Tool returned success, but authoritative state mismatches after strong read. | Mutation is irreversible or high impact. | Hold and Escalate | Automated recovery is unsafe. |
| Verification Timeout | Polling or readback timed out. | Action is reversible and no pivot committed. | Compensate and Stop | Avoid indefinite pending state. |
| Verification Timeout | Polling or readback timed out. | Pivot committed or action irreversible. | Hold and Escalate | Unknown final state must not be retried blindly. |
| Pending Timeout | Execution timed out; idempotency record remains PENDING. |
Durable idempotency record exists. | Poll Durable Idempotency Record | Prevent duplicate mutation while original attempt may still complete. |
| Pending Timeout | Execution timed out; no durable idempotency record exists. | Side-effecting operation. | Freeze and Reconcile Manually | Duplicate retry could create additional side effects. |
| Partial Success | Some sub-actions succeeded; others failed. | Workflow has not passed pivot. | Backward Compensation | Revert completed compensable steps. |
| Partial Success | Some sub-actions succeeded; others failed. | Workflow has passed pivot. | Forward Recovery | Complete remaining idempotent steps to reach consistency. |
| Stale State Error | Target version or ETag does not match requested version. | Optimistic locking failed. | Refresh and Replan | Another actor modified state; planner must use current state. |
| Duplicate Side Effect | More than one transaction exists for one logical operation. | Financial, security, external communication, or critical mutation. | Freeze and Alert | Duplicate execution may require incident handling and compensation. |
| Duplicate Side Effect | Duplicate found but operation is idempotent no-op. | Resulting state is equivalent and allowed by contract. | Record Idempotent Replay | No additional recovery needed, but trace must record duplicate. |
| Wrong Target Modified | Resource tenant, user, region, account, or object ID differs from authorized scope. | Any side-effecting action. | Freeze and Alarm | Potential security incident or data boundary violation. |
| Unverifiable Result | No reliable verification endpoint exists. | Low-risk or read-only operation. | Report Unverified / Degraded | Do not claim verified completion. |
| Unverifiable Result | No reliable verification endpoint exists. | High-risk write or critical mutation. | Hold and Escalate | High-impact actions require independent verification. |
| Compensation Failure | Compensation action failed or produced mismatch. | Any externally visible mutation. | Manual Review / Incident | State may remain inconsistent. |
| Provider Outage | Authoritative provider unavailable. | Action not yet executed. | Defer or Fail Cleanly | Avoid executing into unverifiable outage. |
| Provider Outage | Authoritative provider unavailable. | Action already executed or pending. | Hold Pending and Retry Verification | Avoid duplicate execution; preserve unknown state. |
| Policy Version Mismatch | Action was approved under old policy version. | Policy changed before execution. | Revalidate Approval | Approval may no longer authorize the action. |
| Approval Token Mismatch | Approval token payload hash does not match execution payload. | Any approval-gated action. | Fail Closed | Prevent approval replay or payload substitution. |
Prefer rollback when state is local and uncommitted.
Prefer compensation when prior side effects are reversible.
Prefer forward recovery after pivot or irreversible commit.
Prefer hold-and-escalate when state is unknown, high-impact, or security-sensitive.
Never retry a mutating operation blindly.
The Action Ledger is an append-only record of action intent, execution, verification, reconciliation, and recovery. Runtime traces describe service hops and latency. The Action Ledger describes what changed, what was proven, and how the system decided what to do next.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://canon.ai-eng.org/v5/action-ledger-entry.schema.json",
"title": "ActionLedgerEntry",
"type": "object",
"required": [
"action_id",
"workflow_run_id",
"tenant_id",
"principal_id",
"tool_contract",
"policy_context",
"side_effect_class",
"intended_outcome",
"requested_operation",
"execution",
"verification",
"reconciliation",
"timestamps",
"trace"
],
"additionalProperties": false,
"properties": {
"action_id": {
"type": "string"
},
"workflow_run_id": {
"type": "string"
},
"tenant_id": {
"type": "string"
},
"principal_id": {
"type": "string"
},
"tool_contract": {
"type": "object",
"required": [
"name",
"version",
"schema_version",
"wrapper_version"
],
"additionalProperties": false,
"properties": {
"name": {
"type": "string"
},
"version": {
"type": "string"
},
"schema_version": {
"type": "string"
},
"wrapper_version": {
"type": "string"
}
}
},
"policy_context": {
"type": "object",
"required": [
"autonomy_boundary_version",
"approval_policy_version",
"verification_policy_version",
"recovery_policy_version"
],
"additionalProperties": false,
"properties": {
"autonomy_boundary_version": {
"type": "string"
},
"approval_policy_version": {
"type": "string"
},
"verification_policy_version": {
"type": "string"
},
"recovery_policy_version": {
"type": "string"
}
}
},
"side_effect_class": {
"type": "string",
"enum": [
"READ_ONLY",
"EPHEMERAL_WRITE",
"LOW_RISK_INTERNAL",
"MEDIUM_RISK_WRITE",
"HIGH_RISK_EXTERNAL",
"CRITICAL_MUTATION"
]
},
"idempotency": {
"type": "object",
"required": [
"required",
"key_hash",
"request_hash",
"status"
],
"additionalProperties": false,
"properties": {
"required": {
"type": "boolean"
},
"key_hash": {
"type": [
"string",
"null"
]
},
"request_hash": {
"type": [
"string",
"null"
]
},
"status": {
"type": [
"string",
"null"
],
"enum": [
"PENDING",
"COMPLETED",
"FAILED_RETRYABLE",
"FAILED_FINAL",
"COMPENSATED",
"EXPIRED",
null
]
}
}
},
"approval": {
"type": "object",
"required": [
"required",
"approval_id",
"approver_id",
"payload_hash",
"expires_at"
],
"additionalProperties": false,
"properties": {
"required": {
"type": "boolean"
},
"approval_id": {
"type": [
"string",
"null"
]
},
"approver_id": {
"type": [
"string",
"null"
]
},
"payload_hash": {
"type": [
"string",
"null"
]
},
"expires_at": {
"type": [
"string",
"null"
],
"format": "date-time"
}
}
},
"intended_outcome": {
"type": "object",
"required": [
"target_resource",
"expected_predicates"
],
"additionalProperties": false,
"properties": {
"target_resource": {
"type": "string"
},
"expected_predicates": {
"type": "array",
"items": {
"type": "string"
}
}
}
},
"requested_operation": {
"type": "object",
"required": [
"validated_payload_hash",
"target_resource",
"operation_kind"
],
"additionalProperties": false,
"properties": {
"validated_payload_hash": {
"type": "string"
},
"target_resource": {
"type": "string"
},
"operation_kind": {
"type": "string"
}
}
},
"execution": {
"type": "object",
"required": [
"status",
"observation_pointer",
"attempt_count"
],
"additionalProperties": false,
"properties": {
"status": {
"type": "string",
"enum": [
"NOT_EXECUTED",
"EXECUTING",
"ACCEPTED",
"PENDING",
"COMMITTED",
"FAILED",
"UNKNOWN"
]
},
"observation_pointer": {
"type": [
"string",
"null"
]
},
"attempt_count": {
"type": "integer",
"minimum": 0
}
}
},
"verification": {
"type": "object",
"required": [
"status",
"source",
"query_pointer",
"verified_state_pointer"
],
"additionalProperties": false,
"properties": {
"status": {
"type": "string",
"enum": [
"NOT_REQUIRED",
"NOT_STARTED",
"PENDING",
"VERIFIED",
"FAILED",
"TIMEOUT",
"UNVERIFIABLE"
]
},
"source": {
"type": [
"string",
"null"
]
},
"query_pointer": {
"type": [
"string",
"null"
]
},
"verified_state_pointer": {
"type": [
"string",
"null"
]
}
}
},
"reconciliation": {
"type": "object",
"required": [
"status",
"discrepancy_class",
"recovery_decision"
],
"additionalProperties": false,
"properties": {
"status": {
"type": "string",
"enum": [
"NOT_STARTED",
"RECONCILED_SUCCESS",
"RECONCILED_PARTIAL",
"RECONCILED_FAILURE",
"UNKNOWN",
"COMPENSATED",
"ROLLED_BACK",
"REVIEW_REQUIRED"
]
},
"discrepancy_class": {
"type": [
"string",
"null"
]
},
"recovery_decision": {
"type": [
"string",
"null"
]
}
}
},
"recovery": {
"type": "object",
"required": [
"recovery_action_id",
"compensation_action_id",
"rollback_action_id",
"incident_id",
"review_id"
],
"additionalProperties": false,
"properties": {
"recovery_action_id": {
"type": [
"string",
"null"
]
},
"compensation_action_id": {
"type": [
"string",
"null"
]
},
"rollback_action_id": {
"type": [
"string",
"null"
]
},
"incident_id": {
"type": [
"string",
"null"
]
},
"review_id": {
"type": [
"string",
"null"
]
}
}
},
"timestamps": {
"type": "object",
"required": [
"proposed_at",
"validated_at",
"executed_at",
"verified_at",
"reconciled_at"
],
"additionalProperties": false,
"properties": {
"proposed_at": {
"type": "string",
"format": "date-time"
},
"validated_at": {
"type": [
"string",
"null"
],
"format": "date-time"
},
"executed_at": {
"type": [
"string",
"null"
],
"format": "date-time"
},
"verified_at": {
"type": [
"string",
"null"
],
"format": "date-time"
},
"reconciled_at": {
"type": [
"string",
"null"
],
"format": "date-time"
}
}
},
"trace": {
"type": "object",
"required": [
"trace_id",
"parent_span_id",
"replay_bundle_id"
],
"additionalProperties": false,
"properties": {
"trace_id": {
"type": "string"
},
"parent_span_id": {
"type": [
"string",
"null"
]
},
"replay_bundle_id": {
"type": [
"string",
"null"
]
}
}
}
}
}
| Artifact | Purpose | Storage Posture | Typical Consumer |
|---|---|---|---|
| Runtime Trace | Shows service hops, latency, errors, retries, spans, and dependencies. | High-volume telemetry store. | SRE, performance debugging, incident triage. |
| Action Ledger | Shows logical action state, verification proof, reconciliation, approval, and recovery. | Append-only or tamper-evident store. | Audit, compliance, replay, incident review, billing, governance. |
A trace answers: “What services did the request pass through?”
An action ledger answers: “What did the system intend to change, what actually changed, who approved it, how was it verified, and what recovery path executed?”
To prevent model belief drift, the orchestrator must separate the model’s generation context from verified task state. The model may remember that it proposed or attempted an action. It may not treat that action as complete until reconciliation commits the result.
+--------------------------------------------------------------------------------
| TOOL-RESULT GROUNDING MODEL
+--------------------------------------------------------------------------------
|
| [ Model Execution Context ]
| "I should charge the card."
| |
| v
| [ Tool Proposal ]
| proposed payload, not state truth
| |
| v
| [ Deterministic Wrapper ]
| validates, authorizes, executes, observes
| |
| v
| [ Observation Object ]
| evidence: success, error, accepted, pending, timeout
| |
| v
| [ Verification Layer ]
| source-of-record readback, provider status, ledger query
| |
| v
| [ Reconciliation Result ]
| success, partial, pending, failed, compensated, unknown
| |
| v
| [ Structured Task State ]
| updated only from reconciliation result
| |
| v
| [ Next Model Context ]
| injected with verified state, not raw self-belief
|
+--------------------------------------------------------------------------------
| Reconciliation Result | Task State Update | Model Context Injection | User-Facing Status |
|---|---|---|---|
| Success | Commit verified state and terminal success. | “Verified: action completed; authoritative state is…” | Completion may be reported. |
| Partial Success | Commit completed sub-actions and unresolved items. | “Partially verified: completed X; unresolved Y.” | Report partial completion with next steps. |
| Pending | Record pending state and verification schedule. | “Action accepted; verification pending.” | Do not claim completion. |
| Failure | Record failed state and reason. | “Action failed; no verified mutation occurred.” | Report failure or repair path. |
| Compensated | Record original action and verified compensation. | “Action was reversed through compensation.” | Report compensated recovery. |
| Rolled Back | Record rollback and unchanged target state. | “Transaction rolled back; no durable mutation.” | Report rollback. |
| Unknown | Record unknown state and block unsafe continuation. | “Final state unknown; duplicate action prohibited until reconciliation.” | Report pending review or unknown state. |
| Review Required | Freeze autonomous continuation. | “Human/operator review required.” | Report review status. |
If verification is pending, report pending.
If state is unknown, report unknown.
If action partially completed, report partial.
If compensation occurred, report compensated.
If success is reconciled, only then report complete.
After each verification result, the next planning step must be grounded in the reconciled task state.
| Verified State | Required Planning Behavior |
|---|---|
| Success | Continue to next dependent step. |
| Partial | Decide whether to compensate, forward recover, ask user, or escalate. |
| Pending | Poll, wait, subscribe to webhook, or defer. |
| Failure | Repair, retry if safe, or terminate. |
| Unknown | Do not retry mutating action blindly; reconcile first. |
| Wrong Target / Tenant Drift | Freeze and trigger security/incident path. |
| Compensation Failed | Escalate to manual recovery. |
The model’s context should contain verified state summaries, trace pointers, and unresolved discrepancies. It should not contain raw logs that invite the model to invent a happy ending. The audit department has enough hobbies already.
Action verification must be observable, measurable, replayable, and regression-tested. A verification layer that silently fails is worse than no verification layer because it creates false confidence.
| Metric | Formula / Measurement | What It Detects |
|---|---|---|
| Verification Success Rate | verified_success_actions / executed_actions |
How often executed actions reconcile successfully. |
| Verification Failure Rate | verification_failed_actions / executed_actions |
Frequency of authoritative-state mismatch. |
| Verification Latency | Time from execution observation to verification result. | Slow readbacks, provider lag, polling delay. |
| Time to Reconcile | Time from execution attempt to terminal reconciliation state. | End-to-end truth latency. |
| Read-After-Write Failure Rate | read_after_write_failures / write_actions |
Replica lag, cache issues, writer/read routing bugs. |
| Discrepancy Rate | reconciliation_mismatches / verified_actions |
Mismatch between intended/requested and authoritative state. |
| Stale-State Rate | stale_state_conflicts / write_actions |
Optimistic-lock or concurrent update conflict. |
| Partial Failure Rate | partial_actions / executed_actions |
Distributed transaction fragility. |
| Pending-Too-Long Rate | pending_actions_over_sla / pending_actions |
Stuck async workflows or provider delay. |
| Unknown-State Rate | unknown_state_actions / executed_actions |
Verification blind spots and risky uncertainty. |
| Duplicate Action Rate | duplicate_side_effects / mutating_actions |
Idempotency failure or unsafe retry. |
| Idempotency Recovery Rate | idempotency_replays_resolved / duplicate_attempts |
How well duplicate attempts are handled. |
| Compensation Rate | compensations_started / mutating_actions |
Frequency of compensating recovery. |
| Compensation Success Rate | compensations_verified / compensations_started |
Reliability of reverse actions. |
| Failed Compensation Rate | compensation_failures / compensations_started |
Manual recovery and incident risk. |
| False Success Escape Rate | false_success_reports / executed_actions |
Verification bypass reaching user or downstream planner. |
| Human Escalation Rate | review_required_actions / executed_actions |
Manual review load and automation uncertainty. |
| Action-Ledger Completeness | complete_ledger_entries / actions_requiring_ledger |
Auditability and replay coverage. |
| Verifier Regression Rate | Increase in mismatch/failure after verifier release. | Bad verification policy, schema drift, or release bug. |
Every action should link:
model inference span
-> tool validation span
-> deterministic wrapper span
-> tool execution span
-> verification span
-> reconciliation span
-> recovery span, if needed
Required trace attributes include:
| Attribute | Purpose |
|---|---|
action.id |
Stable action identifier. |
workflow.run_id |
Parent agent run. |
tool.name / tool.version |
Tool contract identity. |
verification.policy_version |
Verification logic version. |
recovery.policy_version |
Recovery decision table version. |
side_effect_class |
Verification-depth selector. |
idempotency.key_hash |
Duplicate-action correlation. |
approval.id |
Approval linkage for gated actions. |
reconciliation.status |
Terminal or pending reconciliation state. |
discrepancy.class |
Mismatch classification. |
ledger.entry_id |
Audit ledger pointer. |
Verification logic is production logic. Changes to predicates, polling rules, source selection, recovery policy, observation schema, or ledger schema can break agent behavior.
Release gates should include:
| Gate | Purpose |
|---|---|
| Trace Replay | Replay historical actions against new verification logic and compare reconciliation outcomes. |
| Predicate Diff Review | Show what verification predicates changed and which actions are affected. |
| Verification Canary | Roll out verifier changes to a small route or tenant subset first. |
| Failure Injection | Simulate stale reads, timeouts, duplicate actions, wrong-tenant results, and partial commits. |
| Recovery Drill | Confirm rollback, compensation, forward recovery, and review escalation paths execute. |
| Ledger Schema Compatibility Check | Ensure new ledger entries remain replayable and audit-compatible. |
| False-Success Guard Test | Verify user-facing completion cannot occur before reconciliation. |
| Unknown-State Test | Confirm unknown states block unsafe duplicate mutation. |
No verifier release should promote unless historical trace replay,
canary metrics, failure injection, and ledger compatibility checks pass.
Action verification is not a passive logging layer. It is the truth-maintenance system that protects the agent from believing its own press release.
The truth-management patterns established in AI-ENG-O provide critical interfaces across the AI Engineering Systems Canon. AI-ENG-O receives proposed and executed actions from orchestration and tool-contract layers, then returns verified, reconciled state to downstream planning, audit, governance, telemetry, evaluation, fallback, and incident systems.
| Target Document | Directed Asset / Metric | Programmatic Consumer | Downstream System Dependency |
|---|---|---|---|
| AI-ENG-M — Agentic Orchestration | Reconciled task state, pending states, unknown states, terminal action statuses. | Agent loop controller and planner. | Determines whether the agent may continue, retry, compensate, ask user, or terminate. |
| AI-ENG-N — Tool Contracts | Verification requirements, post-action hints, idempotency status, observation quality feedback. | Tool wrapper and contract registry. | Improves contract schemas, output objects, idempotency posture, and execution wrappers. |
| AI-ENG-S — Production Pathologies | Verification failure rates, discrepancy classes, false-success escapes, stuck pending actions. | Diagnostic monitors and anomaly detection engines. | Detects production pathologies in action execution and recovery. |
| AI-ENG-T — Permission Boundaries | Tenant verification tags, wrong-target mutations, permission-scope mismatches. | Security gateways and containment systems. | Enforces tenant boundaries and detects data leaks or unauthorized mutations. |
| AI-ENG-U — Dependency Risk | External provider verification latency, outage state, unreachable verification endpoints. | Dependency monitors and availability engines. | Drives fail-closed holds, degraded modes, and backup gateway policy. |
| AI-ENG-V — Resource Abuse | Duplicate-action metrics, polling loops, retry pressure, compensation rates. | Cost trackers and abuse-control systems. | Prevents runaway verification loops, duplicate side effects, and denial-of-wallet patterns. |
| AI-ENG-W — Fallback & Degraded Modes | Pending, unknown, unverifiable, and dependency-failure states. | Fallback routers and degraded-mode controllers. | Determines whether to degrade, defer, hold, or fail closed. |
| AI-ENG-X — Transparent Status | Verified status, partial status, unknown status, compensation status. | User messaging and status interfaces. | Prevents false user-facing success claims. |
| AI-ENG-Y — Human Review | Review-required states, high-risk mismatch packets, compensation failures. | Human review queues and approval systems. | Routes unresolved or risky states to human operators. |
| AI-ENG-Z — Telemetry | Verification spans, reconciliation status, discrepancy metrics, ledger completeness. | Observability stack and SRE dashboards. | Powers reliability monitoring and action truth dashboards. |
| AI-ENG-AA — Agent Evaluations | Trace replay packages, false-success tests, partial-failure scenarios. | Evaluation harnesses and CI/CD tests. | Tests whether agents behave correctly across real action outcomes. |
| AI-ENG-AB — Audit & Replay | Action ledger entries, verification evidence, recovery decisions, policy versions. | Replay engines and audit systems. | Reconstructs execution, verification, and recovery decisions. |
| AI-ENG-AC — Incident Response | Compensation failure codes, wrong-target mutations, duplicate side effects, unknown high-risk states. | Incident management and paging systems. | Triggers containment, rollback, escalation, and postmortem workflows. |
| AI-ENG-AD — Governance, Policy, Compliance & Accountability | Verification policy versions, approval evidence, audit boundaries, accountability records. | Governance, risk, and compliance owners. | Confirms that action truth, approval, recovery, and reporting comply with organizational policy. |
| AI-ENG-AJ — Reference Architecture | End-to-end verification lifecycle, state machine, ledger schema, recovery tables. | Systems development toolkits and SDKs. | Provides implementation templates for production agent platforms. |
The durable handoff is this:
AI-ENG-O exports post-action truth:
what actually happened, how it was verified, whether it reconciled,
what recovery path executed, and what the agent may safely believe next.
To ensure reliability across tool-using agent implementations, systems must adhere to the following principles.
An execution response is an acknowledgment of an attempt, not proof of completed state change. The system must treat the observed tool response as evidence. Task state remains locked until verification confirms the resulting authoritative state.
Authoritative state rarely equals intended state byte-for-byte. Verification should evaluate explicit predicates over resource ID, tenant scope, status, amount, version, actor, idempotency key, audit trail, and business-specific success conditions.
Verification reads must use source-of-record, writer/primary, strongly consistent, or approved verification endpoints. Stale replicas and caches can produce false failures or false confirmations.
Accepted, pending, partial, committed, compensated, rolled back, unknown, and review-required states must be represented explicitly. Flattening these states into success or failure corrupts planning and reporting.
Timeouts and retries must not create duplicate mutations. Side-effecting actions require idempotency posture proportional to risk, and unknown mutating states must block blind retries until reconciled.
Failure recovery must be driven by policy-defined decision tables mapped to discrepancy classes, transaction boundaries, and side-effect classes. The model may assist with explanation, but it should not improvise rollback, compensation, or forward-recovery policy.
Unknown is not success. Unknown is not failure. Unknown means the system cannot safely claim completion or repeat the mutation until reconciliation, review, or recovery establishes truth.
The model’s next planning step must receive verified task state, unresolved discrepancy state, and trace pointers. It must not proceed from historical generation logs or assumed completion.
A user-facing success message may be emitted only after reconciliation. Pending, partial, compensated, failed, and unknown states must be reported honestly.
Action verification must produce ledger entries, trace spans, policy versions, verification evidence, recovery decisions, and terminal states sufficient for audit, replay, incident response, and regression testing.
Changes to verification predicates, recovery tables, ledger schemas, or polling policies must pass replay, canary, failure-injection, and compatibility gates before production promotion.
The final invariant:
The agent may act only through contracts.
The system may believe only verified state.
The user may be told only reconciled truth.