AI-ENG-AJ — AI System Design Patterns - Reference Architectures & Failure-Aware Blueprints

Conceptual Glossary

AI System Design Pattern: A formalized, reusable template addressing a recurring coordination and execution challenge in non-deterministic computing environments. Unlike traditional software patterns that assume static control flows, an AI design pattern maps state transitions while managing the variability and latent risks of foundation model completions.1
Reference Architecture: A highly structured blueprint defining the canonical components, data flows, integration surfaces, and interface boundaries required to instantiate a specific class of AI system.3 It details both functional blocks and the exact contracts that isolate failures and enforce deterministic behavior at probabilistic boundaries.4
Failure-Aware Blueprint: An engineering specification that treats operational failure as a statistical certainty rather than an exceptional anomaly. It embeds detection, mitigation, containment, and recovery mechanisms directly into the system topology to ensure graceful degradation under failure states.4
Pattern Card: A standardized, multi-dimensional specification document used to evaluate, select, implement, and audit a specific AI system design pattern. It operates as the definitive schema of record for an architecture, establishing boundary conditions, testing obligations, and operational controls.
No-Use Condition: A set of technical, operational, or business constraints under which a specific AI design pattern must not be deployed. It defines the boundaries of the pattern’s viability, routing architects to deterministic or non-AI alternatives when those boundaries are breached.
Anti-Pattern: A common, superficially attractive design choice or implementation shortcut that consistently yields systemic failures, uncontrollable costs, security vulnerabilities, or operational degradation in production environments.1
Degraded Mode: A predefined, safe, lower-capability operational state that a system autonomously or semi-autonomously downshifts into when its primary probabilistic pathways fail.4 It prioritizes structural safety, data integrity, and predictability over complete functionality.
Contract Surface Map: A multi-dimensional cross-reference index defining which structural contracts (such as schema, permission, prompt, and tool contracts) are mandatory, conditional, or optional across a portfolio portfolio of reference architectures.
Eval-by-Pattern: The systematic mapping of specialized, quantitative evaluation metrics (e.g., faithfulness, context precision, and tool-call validity) to specific system design patterns to validate execution safety and performance.7
Telemetry-by-Pattern: The prescriptive specification of the exact trace elements, telemetry attributes, and state variables that must be captured and logged for a given pattern to ensure comprehensive observability, auditing, and debugging.4
Human Review Map: A governance matrix specifying the precise interfaces, roles, and levels of authority reserved for human operators within agentic and semi-autonomous systems, calibrated against operational risk tiers.1
Security Boundary Map: A structural specification of the isolation zones, data egress controls, credentials access parameters, and sandboxing requirements across different patterns to defend against prompt injection, privilege escalation, and data exfiltration.9
Pattern Maturity: A six-tier progression framework used to assess the readiness of an AI pattern implementation from initial local prototype (Level 0) to a highly platformized, automated golden path (Level 5).

AI Reference Architecture Doctrine

The foundational thesis of high-dimensional AI engineering asserts that a reference architecture is not merely an illustrative collection of boxes and arrows; it is a decision artifact. A pattern is reusable only when its boundaries, assumptions, and breach behaviors are explicit. Probabilistic cores—such as large language and vision models—must be tightly surrounded by deterministic, typed, versioned, and observable edges. Consequently, an effective reference architecture must encode its contract surfaces, evaluation gates, failure modes, anti-patterns, operating controls, degraded modes, and no-use conditions directly into the structural design.
Traditional systems engineering optimizes for deterministic reliability: inputs are transformed into outputs via known, verifiable code paths. In contrast, AI engineering systems govern probabilistic engines where identical inputs can yield divergent, semantic completions.1 This non-deterministic quality introduces systemic failure modes—including silent drift, tool-use parameter hallucination, cascade failures, and context attention dilution—which cannot be resolved by standard debugging paradigms.5
To build production-grade systems, architects must apply the principles of contract-driven architecture. Every transition point between a deterministic service and a probabilistic model must be governed by an explicit contract stack, as established in the systemic canon. This stack spans user expectations, workflows, policies, prompts, retrievals, routes, schemas, tools, evaluations, deployments, and sourcing models. A design pattern represents the repeatable composition of these contracts to resolve a specific workload class. By structuring patterns as failure-aware blueprints, engineering teams can guarantee that when a probabilistic core fails to meet a contract, the surrounding deterministic architecture isolates the failure, records the complete trajectory, triggers appropriate recovery mechanisms, and downgrades the service controlledly.4

Pattern Card Template

Every reference architecture in this doctrine is documented as a Pattern Card. A Pattern Card is not a decorative summary. It is the reusable architecture record for a workload class: when to use the pattern, when not to use it, what contracts are required, how it fails, how it degrades, how it is evaluated, and who remains accountable.

Canonical Pattern Card Schema

Field	Required Content
Pattern Name	Canonical architecture name.
Problem Class	The recurring system problem this pattern solves.
Best-Fit Use Cases	Workloads where the pattern is structurally appropriate.
No-Use Conditions	Conditions that route the team to deterministic software, another pattern, or no-AI.
User Surface	Chat, inline UI, queue, cockpit, API, background service, review panel, etc.
Architecture Shape	Primary components and dataflow.
Required Contracts	Mandatory AI-ENG-AI contract surfaces.
Human Authority	Human role, approval boundary, veto power, and review burden.
Model Route Strategy	Capability profile and route class, not brittle provider names.
Retrieval / Context Strategy	What context enters the system and how it is governed.
Tool / Action Strategy	Whether tools are read-only, write-capable, sandboxed, or human-approved.
Core Evals	Pattern-specific quality, safety, latency, cost, and regression checks.
Telemetry	Operational metrics needed to debug and improve the pattern.
Audit / Evidence Boundary	Minimal evidence required for compliance, incident review, or replay.
Security Boundary	Isolation, permission, data egress, credential, and sandbox requirements.
Primary Cost Drivers	Tokens, retrieval, GPU, storage, review labor, tool calls, or platform operations.
Failure Modes	Known ways the pattern fails in production.
Anti-Patterns	Common tempting but unsafe implementations.
Degraded Mode	Safe reduced-capability behavior.
Adoption and Support	Training, workflow change, support, and user enablement needs.
Sourcing / Exit	Portability, vendor dependency, and migration considerations.
Maturity Target	Expected implementation maturity level for production use.

Pattern Card Markdown Form

Each card should be written as real Markdown, not fenced pseudo-documentation. This makes the pattern searchable, linkable, diffable, and indexable.

### **Pattern Name**

| Field | Specification |
| :---- | :---- |
| **Problem Class** |  |
| **Best-Fit Use Cases** |  |
| **No-Use Conditions** |  |
| **User Surface** |  |
| **Architecture Shape** |  |
| **Required Contracts** |  |
| **Human Authority** |  |
| **Model Route Strategy** |  |
| **Retrieval / Context Strategy** |  |
| **Tool / Action Strategy** |  |
| **Core Evals** |  |
| **Telemetry** |  |
| **Audit / Evidence Boundary** |  |
| **Security Boundary** |  |
| **Primary Cost Drivers** |  |
| **Failure Modes** |  |
| **Anti-Patterns** |  |
| **Degraded Mode** |  |
| **Adoption and Support** |  |
| **Sourcing / Exit** |  |
| **Maturity Target** |  |

The Pattern Card is the handoff object between product architecture, engineering implementation, governance, evaluation, operations, adoption, and sourcing.

Architecture Pattern Taxonomy

AI system patterns should be organized by workflow archetype, authority level, state mutation, evidence burden, and user-review structure. A taxonomy is useful only if it helps teams route a real requirement to the correct architecture and reject bad fits early.

AI SYSTEM DESIGN PATTERN TAXONOMY

1. Interactive & Embedded
   ├── Copilot / Embedded Assistant
   └── Personal or Team Productivity Assistant

2. Retrieval & Synthesis
   ├── Research Agent
   └── Enterprise Knowledge System

3. Extraction & Classification
   ├── Document Intelligence Pipeline
   ├── Multimodal Review System
   └── Background Classifier / Router

4. Analytics & Decision Intelligence
   ├── Analytics Assistant
   └── Decision-Support Cockpit

5. Support & Service Operations
   └── Support Assistant

6. Action-Oriented / Agentic
   ├── Workflow Automation Agent
   ├── Coding Agent
   └── Governed Agentic Workflow

7. Human Review & Governance
   └── Human Review and Escalation Queue

8. Platform Infrastructure
   ├── AI Gateway / Control Plane
   └── Evaluation and Shadow-Mode Pattern

Taxonomy by Control Property

Pattern Family	Primary User Surface	Runtime Authority	Primary Risk	Core Contract Emphasis
Interactive & Embedded	Inline assistant, sidebar, editor surface.	Suggests; user accepts or edits.	Overtrust, context leakage, poor fit.	Prompt, context, route, observability, user expectation.
Retrieval & Synthesis	Search portal, research console, cited answer.	Synthesizes from evidence.	Citation theater, stale/unauthorized evidence.	Retrieval, grounding, freshness, permission, eval.
Extraction & Classification	Queue, parser, background service, review panel.	Produces structured output or route label.	Silent field errors, misrouting, reviewer fatigue.	Schema, evidence, confidence, exception queue, eval.
Analytics & Decision Intelligence	BI workspace, risk cockpit, scenario panel.	Explains, computes, or recommends; human decides.	Metric hallucination, biased framing, wrong calculation.	Semantic metric, SQL/query, grounding, human review, audit.
Support & Service Operations	Customer chat, agent assist, support console.	Drafts, triages, or resolves bounded issues.	Deflection theater, policy hallucination, poor handoff.	Retrieval, escalation, user expectation, audit, telemetry.
Action-Oriented / Agentic	Task dashboard, PR interface, process portal.	Plans or executes bounded steps under policy.	Unauthorized side effects, loops, partial execution.	Tool, permission, idempotency, resource, action verification.
Human Review & Governance	Review queue, approval panel, audit cockpit.	Human validates and authorizes.	Rubber-stamping, queue overload, weak evidence.	Human review, evidence, override, audit, telemetry.
Platform Infrastructure	Internal API, gateway, release dashboard.	Controls routes, policy, eval, and observability.	Central failure, bypass, weak evidence, cost blowout.	Deployment, route, policy, resource, observability, sourcing.

Pattern selection is not model selection. It is authority design.

Pattern Selection Tree

The selection tree routes a workload to the safest viable architecture pattern. It includes deterministic and no-AI paths because not every valuable workflow deserves a model-shaped hole punched through it.

PATTERN SELECTION TREE

[ Candidate Workflow ]
        |
        v
Q1. Is the task fully deterministic, exact, or better solved by rules/database/forms?
        |
        +-- yes --> [ Deterministic Software / No-AI Path ]
        |
        v
Q2. Does the system need to mutate external state or execute side effects?
        |
        +-- yes --> Q3
        |
        +-- no  --> Q7

Q3. Is the task software codebase modification, build/test, or PR generation?
        |
        +-- yes --> [ Coding Agent ]
        |
        +-- no  --> Q4

Q4. Is the action path mostly fixed, schema-bound, and workflow-driven?
        |
        +-- yes --> [ Workflow Automation Agent ]
        |
        +-- no  --> Q5

Q5. Does the task require dynamic planning, multi-step reasoning, or multi-agent checks?
        |
        +-- yes --> [ Governed Agentic Workflow ]
        |
        +-- no  --> Q6

Q6. Is the action high-risk, irreversible, regulated, or approval-sensitive?
        |
        +-- yes --> [ Human Review and Escalation Queue + Deterministic Execution ]
        |
        +-- no  --> [ Workflow Automation Agent ]

Q7. Is the primary task factual retrieval, synthesis, or evidence search?
        |
        +-- yes --> Q8
        |
        +-- no  --> Q11

Q8. Is the corpus enterprise-owned, multi-repository, permissioned, and lifecycle-managed?
        |
        +-- yes --> [ Enterprise Knowledge System ]
        |
        +-- no  --> Q9

Q9. Is the search open-ended, multi-hop, external, or research-oriented?
        |
        +-- yes --> [ Research Agent ]
        |
        +-- no  --> Q10

Q10. Is the output a high-stakes recommendation with alternatives and rationale?
        |
        +-- yes --> [ Decision-Support Cockpit ]
        |
        +-- no  --> [ Enterprise Knowledge System or Deterministic Search ]

Q11. Is the primary task structured extraction from documents or media?
        |
        +-- yes --> Q12
        |
        +-- no  --> Q14

Q12. Does the input include image, audio, video, scanned media, or spatial evidence?
        |
        +-- yes --> [ Multimodal Review System ]
        |
        +-- no  --> Q13

Q13. Is the target output typed fields from documents?
        |
        +-- yes --> [ Document Intelligence Pipeline ]
        |
        +-- no  --> [ Deterministic Parser / No-AI Path ]

Q14. Is the primary task governed analytics, SQL, metrics, or dashboard explanation?
        |
        +-- yes --> [ Analytics Assistant ]
        |
        +-- no  --> Q15

Q15. Is the primary task customer or internal support?
        |
        +-- yes --> [ Support Assistant ]
        |
        +-- no  --> Q16

Q16. Is the workload high-throughput background classification or routing?
        |
        +-- yes --> [ Background Classifier / Router ]
        |
        +-- no  --> Q17

Q17. Is the AI embedded inside an active workspace as inline assistance?
        |
        +-- yes --> [ Copilot / Embedded Assistant ]
        |
        +-- no  --> Q18

Q18. Is the system a personal/team assistant with local context and low action authority?
        |
        +-- yes --> [ Personal or Team Productivity Assistant ]
        |
        +-- no  --> Q19

Q19. Is the requirement infrastructure for model access, routing, policy, or cost control?
        |
        +-- yes --> [ AI Gateway / Control Plane ]
        |
        +-- no  --> Q20

Q20. Is the requirement validating candidate models/prompts/routes without affecting users?
        |
        +-- yes --> [ Evaluation and Shadow-Mode Pattern ]
        |
        +-- no  --> [ Return to Product Discovery / Pattern Not Selected ]

Selection Rule

If two patterns appear plausible, choose the one with less autonomy, clearer evidence, lower integration burden, and safer degraded mode. If the task can be solved cleanly without AI, that is not a failure of imagination. It is architecture doing its job.

Reference Blueprint Set

The reference blueprint set defines reusable AI system patterns. Each pattern is expressed as a real Markdown card so it can be searched, indexed, diffed, linked, and reused by architecture teams.

1. Copilot / Embedded Assistant Pattern

Field	Specification
Problem Class	Interactive, context-aware assistance inside an active workspace.
Best-Fit Use Cases	Inline code suggestions, prose completion, spreadsheet assistance, structured form guidance, drafting aids.
No-Use Conditions	High-liability transactions, exact calculations, background batch processing, or actions requiring autonomous write authority.
User Surface	Inline suggestions, ghost text, side panel, contextual menu.
Architecture Shape	Workspace event → context builder → policy/data filter → fast model route → suggestion renderer → user accept/edit/reject → telemetry/eval loop.
Required Contracts	Prompt, context, schema where structured, resource, model route, observability, user expectation.
Human Authority	Human remains active controller; AI suggests only.
Model Route Strategy	Low-latency route optimized for short completions; escalation only for explicitly requested deeper help.
Retrieval / Context Strategy	Local workspace state, selected document/code region, nearby context, active user intent.
Tool / Action Strategy	No external side effects; local UI modifications only.
Core Evals	Acceptance quality, edit distance, compile/test result where applicable, latency, user correction patterns.
Telemetry	Suggestion hash, route ID, prompt/schema version, accept/reject/edit event, latency, cost bucket.
Audit / Evidence Boundary	Usually operational telemetry only; sensitive content should be redacted or referenced securely.
Security Boundary	Workspace context must be scoped; cloud routes require data filtering and tenant policy.
Primary Cost Drivers	High-frequency small completions, context assembly, streaming latency.
Failure Modes	Context distraction, stale suggestions, overtrust, low-quality accepted code/text.
Anti-Patterns	Chatbox on everything; acceptance rate treated as correctness.
Degraded Mode	Static templates, local autocomplete, deterministic snippets.
Adoption and Support	Low training burden, but users need calibration on review and acceptance.
Sourcing / Exit	Keep completion interface provider-neutral.
Maturity Target	Level 3–4 for production; Level 5 when platformized across teams.

2. Research Agent Pattern

Field	Specification
Problem Class	Open-ended multi-source discovery, analysis, and factual synthesis.
Best-Fit Use Cases	Market research, literature review, policy analysis, competitor research, legal or technical source aggregation.
No-Use Conditions	Exact database lookup, simple FAQ retrieval, high-speed customer answer, or unsupported evidence domains.
User Surface	Research console, plan editor, citation panel, source browser, draft workspace.
Architecture Shape	Research question → plan/query decomposition → bounded search → source authority filter → evidence clustering → synthesis → citation verifier → human audit.
Required Contracts	Prompt, retrieval, grounding, source authority, freshness, resource, eval, observability, user expectation.
Human Authority	Human sets objective, approves plan, reviews sources, and accepts final synthesis.
Model Route Strategy	Planning/synthesis route for reasoning; cheaper extraction/summarization routes for source processing.
Retrieval / Context Strategy	Multi-hop search with source authority, freshness, dedupe, and conflict handling.
Tool / Action Strategy	Read-only search/document tools; sandboxed browsing or parsers.
Core Evals	Citation fidelity, claim support, source quality, context recall, contradiction handling, synthesis usefulness.
Telemetry	Query plan, search calls, source IDs, citation verifier status, loop count, cost, user edits.
Audit / Evidence Boundary	Source manifest, claim/evidence map, verifier result, final draft version.
Security Boundary	Prevent leakage of confidential queries to external search where prohibited.
Primary Cost Drivers	Search loops, long-context synthesis, citation verification, human review time.
Failure Modes	Citation theater, source laundering, endless search, weak source authority, stale evidence.
Anti-Patterns	Searching until confident; citing documents the system did not verify.
Degraded Mode	Present source directory and extracted notes without synthesis.
Adoption and Support	Users need training on source audit and uncertainty handling.
Sourcing / Exit	Preserve outputs and source maps in open formats.
Maturity Target	Level 3–4.

3. Support Assistant Pattern

Field	Specification
Problem Class	Customer or internal support automation and agent-assist.
Best-Fit Use Cases	Routine support answers, policy explanations, ticket summarization, suggested replies, triage.
No-Use Conditions	Emergency services, high-emotion disputes without human path, legal/medical advice, unresolved policy exceptions.
User Surface	Chat, support widget, CRM agent-assist panel, ticket console.
Architecture Shape	Message/ticket → intent classifier → account/context retrieval → policy/KB retrieval → answer/draft generation → escalation gate → resolution telemetry.
Required Contracts	Intent schema, retrieval, grounding, permission, escalation, user expectation, observability, eval.
Human Authority	Human handles exceptions, low-confidence cases, sensitive accounts, and transactional approvals.
Model Route Strategy	Fast classifier plus governed generation route; fallback to human queue.
Retrieval / Context Strategy	Customer/account data through permission filters plus versioned support knowledge.
Tool / Action Strategy	Read-only by default; write actions require deterministic API and approval where material.
Core Evals	True resolution, repeat-contact rate, escalation quality, policy compliance, CSAT/sentiment, hallucination rate.
Telemetry	Intent, source IDs, route ID, escalation reason, resolution status, repeat contact, agent edits.
Audit / Evidence Boundary	Ticket ID, source policy version, final sent message, escalation packet.
Security Boundary	Tenant isolation, PII redaction, support-role access control.
Primary Cost Drivers	Multi-turn context, support volume, human escalation, KB maintenance.
Failure Modes	Deflection theater, policy hallucination, customer frustration, hidden escalation suppression.
Anti-Patterns	Trapping users in bot loops; optimizing containment over resolution.
Degraded Mode	Route to human queue with context handoff and static help links.
Adoption and Support	Requires support-team training and escalation playbooks.
Sourcing / Exit	Preserve KB, intent taxonomy, ticket metadata, and correction logs.
Maturity Target	Level 4–5.

4. Document Intelligence Pipeline Pattern

Field	Specification
Problem Class	Structured extraction from unstructured or semi-structured documents.
Best-Fit Use Cases	Invoices, claims, leases, forms, contracts, financial statements, intake packets.
No-Use Conditions	Clean API/JSON input, exact deterministic parsing availability, high-speed subsecond transaction need.
User Surface	Review queue with document preview, field table, coordinates, correction UI.
Architecture Shape	Document ingest → OCR/layout parser → document classifier → extraction model → schema/semantic validator → exception queue → staging write.
Required Contracts	Input asset, OCR/layout, schema, semantic, evidence coordinate, human review, write-back, eval.
Human Authority	Auditors correct low-confidence or high-impact fields before commit.
Model Route Strategy	Extraction route matched to document class and modality; deterministic validators at edge.
Retrieval / Context Strategy	Layout, OCR, metadata, document class, field schema; no general RAG unless needed.
Tool / Action Strategy	No direct production write without validator and review/threshold gate.
Core Evals	Field-level precision/recall, table extraction, coordinate grounding, schema validity, correction rate.
Telemetry	Document hash, class, field confidence, validation errors, reviewer corrections, processing time.
Audit / Evidence Boundary	Source document reference, extracted fields, coordinate/evidence refs, reviewer decision.
Security Boundary	Ephemeral parsing, malware scanning, PII controls, storage permissions.
Primary Cost Drivers	OCR/layout, visual tokens, storage, review labor.
Failure Modes	OCR errors, table misalignment, missing pages, wrong document class, reviewer fatigue.
Anti-Patterns	Direct ERP writes from unverified extraction; ignoring visual layout.
Degraded Mode	Manual indexing/review queue.
Adoption and Support	Requires reviewer training and document-class governance.
Sourcing / Exit	Preserve schemas and extraction records in portable formats.
Maturity Target	Level 4.

5. Workflow Automation Agent Pattern

Field	Specification
Problem Class	Multi-step workflow execution across systems with bounded side effects.
Best-Fit Use Cases	Employee onboarding, ticket synchronization, procurement coordination, IT operations runbooks.
No-Use Conditions	Irreversible actions without approval, ambiguous goals, APIs without idempotency, missing source-of-record verification.
User Surface	Task dashboard, execution plan, approval checkpoints, state timeline.
Architecture Shape	Goal → plan graph → policy/permission check → tool execution → source-of-record verification → approval gate → ledger.
Required Contracts	Plan schema, tool, permission, idempotency, resource, action verification, audit, observability.
Human Authority	Human approves plan and high-impact steps; can halt, edit, or roll back.
Model Route Strategy	Reasoning route for planning; deterministic execution harness for actions.
Retrieval / Context Strategy	API specs, workflow state, policy rules, system-of-record data.
Tool / Action Strategy	Write-capable only through typed tools with idempotency and postcondition checks.
Core Evals	Task completion, tool validity, permission denial correctness, postcondition success, loop budget.
Telemetry	Plan graph, tool call IDs, idempotency keys, verification results, approvals, breaches.
Audit / Evidence Boundary	Execution ledger, payload hashes, approval records, source-of-record confirmation.
Security Boundary	Least-privilege tools, sandboxing, no broad admin agent identity.
Primary Cost Drivers	Planning loops, tool retries, approval latency, integration maintenance.
Failure Modes	Partial transaction, loop, state divergence, unauthorized action.
Anti-Patterns	Agent with admin rights; missing idempotency; no postcondition checks.
Degraded Mode	Freeze state, serialize plan, route to manual operator.
Adoption and Support	Requires operators trained on execution graphs and exception handling.
Sourcing / Exit	Keep tool schemas and workflow state portable.
Maturity Target	Level 4.

6. Coding Agent Pattern

Field	Specification
Problem Class	Codebase analysis, patch generation, testing, and pull-request preparation.
Best-Fit Use Cases	Dependency updates, scaffolding, test generation, routine bug fixes, migrations.
No-Use Conditions	Untestable systems, critical infrastructure without review, production auto-merge, missing sandbox.
User Surface	IDE, CLI, issue tracker, pull request, CI dashboard.
Architecture Shape	Issue/request → repo context selector → plan/diff generation → sandbox build/test → security scan → PR → human review.
Required Contracts	Repo access, context, patch format, sandbox, test execution, security scan, human review, deployment.
Human Authority	Developer reviews, edits, approves, and merges.
Model Route Strategy	Reasoning route for patch planning; cheaper route for log analysis.
Retrieval / Context Strategy	AST, dependency graph, related files, issue text, tests, build logs.
Tool / Action Strategy	File edits and commands only in sandbox; merge requires human/CI gate.
Core Evals	Compile success, test pass, security scan, diff minimality, reviewer acceptance, regression rate.
Telemetry	Diff hash, build logs, test results, scan findings, reviewer edits, merge status.
Audit / Evidence Boundary	PR record, commit hashes, test reports, reviewer approval.
Security Boundary	Network-restricted sandbox; secrets masked; no unreviewed production writes.
Primary Cost Drivers	Repo context, build/test loops, reasoning tokens.
Failure Modes	Plausible broken code, test overfitting, vulnerability injection, context poisoning.
Anti-Patterns	Auto-merge coding agent; writing tests to satisfy broken code.
Degraded Mode	Suggestion-only mode; disable write/PR creation.
Adoption and Support	Requires issue-writing discipline and reviewer workflow integration.
Sourcing / Exit	Store patches and scripts in standard Git workflows.
Maturity Target	Level 3–4.

7. Analytics Assistant Pattern

Field	Specification
Problem Class	Governed natural-language analytics, metric explanation, SQL/query generation, and dashboard assistance.
Best-Fit Use Cases	Ad-hoc business questions, operational reporting, metric exploration, dashboard drafting.
No-Use Conditions	Regulated filings without deterministic controls, missing semantic metric layer, unrestricted database access.
User Surface	BI assistant, metric builder, SQL preview, chart explanation panel.
Architecture Shape	User question → metric/schema selector → permission check → query generator → deterministic validator → read-only execution → visualization/explanation.
Required Contracts	Semantic metric, schema, permission, SQL/query, resource, provenance, eval, observability.
Human Authority	Analyst validates metric meaning, query, and chart interpretation.
Model Route Strategy	Reasoning route for query generation; deterministic validator and metric layer are authoritative.
Retrieval / Context Strategy	Metric definitions, schema metadata, join rules, row-level policy.
Tool / Action Strategy	Read-only queries with timeouts, row limits, and query validation.
Core Evals	SQL validity, metric correctness, row-level security, chart-data consistency, explanation fidelity.
Telemetry	Question, generated query hash, metric IDs, validation result, query cost, user corrections.
Audit / Evidence Boundary	Query, metric definition, result reference, chart spec, user approval where needed.
Security Boundary	Read-only credentials, row-level security, query sandbox/resource limits.
Primary Cost Drivers	Warehouse compute, complex joins, metadata retrieval, explanation generation.
Failure Modes	Metric hallucination, wrong joins, unauthorized data exposure, misleading charts.
Anti-Patterns	Querying raw tables without metric layer; model-generated audit numbers.
Degraded Mode	Disable ad-hoc query; show verified dashboards and metric glossary.
Adoption and Support	Requires data literacy and metric-governance alignment.
Sourcing / Exit	Use portable semantic-layer definitions and SQL artifacts.
Maturity Target	Level 4.

8. Multimodal Review System Pattern

Field	Specification
Problem Class	Review and audit of images, audio, video, scanned documents, or spatial/temporal evidence.
Best-Fit Use Cases	Site inspections, media policy review, brand compliance, real-estate review, document-image validation.
No-Use Conditions	Clinical/safety-critical diagnosis without regulated validation, poor media quality, no expert review path.
User Surface	Timeline, media canvas, bounding boxes, transcript, annotation review panel.
Architecture Shape	Media ingest → preprocessing/sampling → modality route → multimodal analysis → coordinate/timecode mapping → expert review → record.
Required Contracts	Asset schema, modality, sampling, coordinate/timecode, detection label, human review, audit, eval.
Human Authority	Domain expert confirms or corrects detections and final judgment.
Model Route Strategy	Multimodal route selected by asset type and evidence requirements.
Retrieval / Context Strategy	Metadata, checklists, prior reference images, transcripts where relevant.
Tool / Action Strategy	Media parsers/samplers in sandbox; no autonomous final decision in high-risk cases.
Core Evals	Detection precision/recall, coordinate grounding, transcription accuracy, reviewer agreement, false-negative rate.
Telemetry	Media hash, sampling settings, detection labels, coordinates/timecodes, reviewer adjustments.
Audit / Evidence Boundary	Media reference, annotation IDs, reviewer decision, coordinate/timecode evidence.
Security Boundary	Media sandbox, codec exploit protection, PII redaction/blurring where required.
Primary Cost Drivers	Video/audio processing, visual tokens, storage, expert review.
Failure Modes	Missed anomalies, timecode drift, bounding errors, transcript hallucination, reviewer fatigue.
Anti-Patterns	Generic visual descriptions with no coordinate evidence.
Degraded Mode	Disable overlays; present raw media and manual checklist.
Adoption and Support	Requires expert review training and annotation standards.
Sourcing / Exit	Store annotations in open schema with media references.
Maturity Target	Level 3–4.

9. Enterprise Knowledge System Pattern

Field	Specification
Problem Class	Governed enterprise knowledge retrieval and synthesis across permissioned repositories.
Best-Fit Use Cases	Policy search, internal documentation, compliance research, support grounding, product knowledge.
No-Use Conditions	Exact transactional lookup, unmanaged corpora, missing ACLs, no content lifecycle.
User Surface	Search portal, chat/search UI, document preview, citation panel.
Architecture Shape	Repository sync → ACL processing → chunk/index → hybrid search/rerank → grounding verifier → answer synthesis → feedback loop.
Required Contracts	Ingestion, permission, retrieval, freshness, grounding, citation, lifecycle, eval, observability.
Human Authority	Content owners manage source quality; users verify cited answers.
Model Route Strategy	Retrieval-first; synthesis route only after evidence is permissioned and sufficient.
Retrieval / Context Strategy	Hybrid retrieval with ACL/RLS, source authority, freshness, dedupe, conflict handling.
Tool / Action Strategy	Read-only retrieval; no document mutation unless separately governed.
Core Evals	Context precision/recall, answer faithfulness, citation support, permission safety, freshness.
Telemetry	Query, user/role scope reference, source IDs, retrieval rank, citation verification, feedback.
Audit / Evidence Boundary	Source refs, answer version, citation verifier status, access-decision record.
Security Boundary	Chunk-level permissions, tenant isolation, restricted corpus ingestion.
Primary Cost Drivers	Ingestion, embeddings/indexes, reranking, storage, source lifecycle.
Failure Modes	Permission leakage, stale summaries, duplicate/conflicting docs, vector dump chaos.
Anti-Patterns	Vector Dump Knowledge System; RAG-as-database.
Degraded Mode	Keyword/file search with document links; no synthesis.
Adoption and Support	Requires knowledge management and content-owner workflows.
Sourcing / Exit	Preserve source documents, metadata, and index rebuild path.
Maturity Target	Level 4.

10. Decision-Support Cockpit Pattern

Field	Specification
Problem Class	High-stakes evidence review, scenario analysis, and human decision support.
Best-Fit Use Cases	Underwriting, legal strategy, healthcare support, risk review, supply-chain contingency planning.
No-Use Conditions	Autonomous final decision, high-volume low-review tasks, no expert validation path.
User Surface	Evidence cockpit, scenario matrix, risk panel, rationale capture.
Architecture Shape	Case file → evidence gathering → policy/guideline retrieval → scenario generation → risk/uncertainty analysis → human decision record.
Required Contracts	Case assembly, retrieval, grounding, risk, human review, rationale, audit, user expectation.
Human Authority	Human is final decision-maker; AI frames options and evidence only.
Model Route Strategy	High-reasoning route for scenario framing; deterministic calculators for numbers.
Retrieval / Context Strategy	Case record, policies, historical precedents, external evidence where approved.
Tool / Action Strategy	Simulations/read-only analysis; external writes require human approval and deterministic execution.
Core Evals	Evidence completeness, option relevance, risk coverage, calibration, rationale quality, bias review.
Telemetry	Case ID, evidence refs, scenarios generated, user choice, rationale, override/correction.
Audit / Evidence Boundary	Decision record, evidence refs, human rationale, versioned policy sources.
Security Boundary	Regulated workspace, role access, strict retention and evidence policy.
Primary Cost Drivers	Expert review, long-case context, reasoning, audit requirements.
Failure Modes	Automation bias, biased framing, missing risk, information overload.
Anti-Patterns	Human as rubber-stamp for automated decision.
Degraded Mode	Static checklist and raw evidence package.
Adoption and Support	Requires training on automation bias and evidence inspection.
Sourcing / Exit	Preserve case/rationale records and scenario schemas.
Maturity Target	Level 4.

11. Background Classifier / Router Pattern

Field	Specification
Problem Class	High-throughput classification, triage, and routing.
Best-Fit Use Cases	Ticket routing, alert triage, anomaly tagging, document category routing, moderation queues.
No-Use Conditions	High-liability decisions without review, deterministic metadata routing, low-volume tasks.
User Surface	Mostly headless; exception and audit dashboard.
Architecture Shape	Event ingest → feature extraction → classifier → confidence/threshold gate → deterministic router → exception queue.
Required Contracts	Event schema, class taxonomy, confidence threshold, exception queue, route/action schema, eval, observability.
Human Authority	Queue managers review exceptions and taxonomy drift.
Model Route Strategy	Fast classifier route; fallback to deterministic rules or manual queue.
Retrieval / Context Strategy	Taxonomy, route definitions, metadata, short event body.
Tool / Action Strategy	Queue writes only; high-impact outcomes require human review.
Core Evals	Precision/recall, confusion matrix, false-negative cost, drift, exception rate.
Telemetry	Event hash, class, confidence bucket, route target, exception reason, correction.
Audit / Evidence Boundary	Event reference, class label, route decision, taxonomy version.
Security Boundary	Input sanitization, queue permission, no arbitrary payload execution.
Primary Cost Drivers	Event volume, classification calls, exception labor.
Failure Modes	Silent misrouting, class drift, queue overload, payload manipulation.
Anti-Patterns	No exception queue; automating high-liability routing.
Degraded Mode	Route all events to manual triage or deterministic rules.
Adoption and Support	Requires taxonomy ownership and queue SLA review.
Sourcing / Exit	Preserve class taxonomy and labeled examples.
Maturity Target	Level 4–5.

12. Personal or Team Productivity Assistant Pattern

Field	Specification
Problem Class	Local or team-scoped drafting, summarization, note search, and lightweight assistance.
Best-Fit Use Cases	Meeting notes, email drafts, personal knowledge search, team document drafting.
No-Use Conditions	System-of-record writes, regulated workflows, customer-facing automation, enterprise-wide authority.
User Surface	Sidebar, desktop widget, team chat assistant, document pane.
Architecture Shape	User prompt → local/team context assembly → safety/policy filter → optional memory/retrieval → model route → user review/copy.
Required Contracts	Prompt, context, memory where active, resource, user expectation, observability.
Human Authority	User owns final copy/send/paste/action.
Model Route Strategy	Low/medium capability route; local/private route for sensitive contexts where needed.
Retrieval / Context Strategy	Local notes, team docs, current workspace, permissioned memory.
Tool / Action Strategy	No direct external execution unless separately governed.
Core Evals	User utility, memory relevance, formatting, safety policy, latency.
Telemetry	Minimal usage/correction signals; avoid surveillance-style individual monitoring.
Audit / Evidence Boundary	Usually none beyond operational telemetry unless enterprise data/risk requires.
Security Boundary	Protect notes, credentials, local files, and team permissions.
Primary Cost Drivers	Frequent low-value calls, local indexes, memory storage.
Failure Modes	Memory drift, hallucinated summaries, context leakage, overcollection.
Anti-Patterns	Mixing team databases without permission boundaries.
Degraded Mode	Standard editor/search without AI.
Adoption and Support	Basic training on data boundaries and review.
Sourcing / Exit	Export notes/memory/context indexes where appropriate.
Maturity Target	Level 2–3, higher if enterprise-governed.

13. Governed Agentic Workflow Pattern

Field	Specification
Problem Class	Complex, stateful, multi-step agentic workflow with governance and checkpoints.
Best-Fit Use Cases	Compliance auditing, multi-step underwriting support, controlled security review, complex operational routing.
No-Use Conditions	Linear deterministic workflow, subsecond latency, missing rollback, missing schema/state model.
User Surface	Process portal, graph visualization, checkpoint ledger, approval queue.
Architecture Shape	Goal → graph/state machine → planner/executor/auditor nodes → checkpoint ledger → deterministic gate → human escalation.
Required Contracts	Graph, prompt, context, retrieval, tool, permission, resource, memory, checkpoint, eval, observability, audit.
Human Authority	Process owner approves plan, handles escalations, can halt/rollback.
Model Route Strategy	Reasoning route for planning/auditing; deterministic state machine controls execution.
Retrieval / Context Strategy	Node-specific context, policies, APIs, state, prior checkpoints.
Tool / Action Strategy	Tool calls gated by node contract, idempotency, permission, and postcondition checks.
Core Evals	Node transition correctness, loop/budget compliance, task completion, rollback, checkpoint replay.
Telemetry	Graph path, node states, tool calls, approvals, resource use, breaches.
Audit / Evidence Boundary	Checkpoints, payload hashes, approval records, state transitions, final confirmation.
Security Boundary	Isolated node execution, scoped credentials, no cross-agent privilege leakage.
Primary Cost Drivers	Multi-agent loops, long contexts, checkpoints, human escalation.
Failure Modes	Consensus deadlock, runaway graph, state divergence, tool abuse.
Anti-Patterns	Multi-agent framework for a simple linear workflow.
Degraded Mode	Pause graph, serialize state, route to process supervisor.
Adoption and Support	Requires operator training on graph debugging and escalation.
Sourcing / Exit	Keep graph definitions, state, tools, and checkpoints portable.
Maturity Target	Level 4.

14. AI Gateway / Control Plane Pattern

Field	Specification
Problem Class	Central model access, routing, policy, observability, quota, budget, and provider abstraction.
Best-Fit Use Cases	Enterprise model access, multi-provider routing, cost control, credentials management, policy enforcement.
No-Use Conditions	Local throwaway prototype with no enterprise data, no shared users, and no production route.
User Surface	Developer API, platform dashboard, admin console.
Architecture Shape	App request → identity/policy/quota → cache where safe → route selection → provider/self-hosted model → validation/telemetry.
Required Contracts	Route, permission, resource, vendor, observability, deployment, eval, sourcing.
Human Authority	Platform team controls routes, keys, budgets, policy, and emergency shutdown.
Model Route Strategy	Provider-neutral route aliases tied to task/risk/eval contracts.
Retrieval / Context Strategy	Optional cache/retrieval only with tenant, freshness, and permission scope.
Tool / Action Strategy	Usually no direct business action; may broker tool calls through separate contracts.
Core Evals	Gateway latency, route correctness, policy block correctness, cost control, provider failover.
Telemetry	Route ID, contract versions, tokens, cost, latency, policy decisions, errors, breaches.
Audit / Evidence Boundary	Route manifest, vendor contract refs, policy decisions, incident events.
Security Boundary	Key vault, egress controls, tenant isolation, DLP/policy before provider call.
Primary Cost Drivers	High-volume proxying, logs/traces, caching, provider calls.
Failure Modes	Single point of failure, gateway bypass, cache leakage, policy misroute.
Anti-Patterns	Direct provider SDK sprawl; gateway bypass.
Degraded Mode	Fail closed or use approved fallback routes preserving contracts.
Adoption and Support	Requires developer onboarding and platform support.
Sourcing / Exit	Central mechanism for provider exit and route migration.
Maturity Target	Level 5.

15. Human Review and Escalation Queue Pattern

Field	Specification
Problem Class	Exception handling, human validation, approval, and correction capture.
Best-Fit Use Cases	Low-confidence extraction, high-risk actions, support escalations, moderation, regulated review.
No-Use Conditions	Very low-risk high-volume tasks with reliable deterministic validation, or no reviewer capacity.
User Surface	Review queue, evidence panel, correction editor, approval/deny controls.
Architecture Shape	AI proposal → escalation gate → priority queue → review canvas → human decision → deterministic validator → downstream commit or rejection.
Required Contracts	Escalation, human review, evidence, schema, permission, audit, observability.
Human Authority	Reviewer can approve, reject, correct, escalate, or block.
Model Route Strategy	Use AI to pre-process and prioritize; do not rely on high-cost retries to avoid review.
Retrieval / Context Strategy	Evidence packet, source refs, policy, prior corrections.
Tool / Action Strategy	Human-approved payload goes through deterministic validator and action contract.
Core Evals	Reviewer agreement, false negatives, correction categories, queue latency, fatigue indicators.
Telemetry	Queue age, reviewer action, correction delta, approval/rejection, downstream result.
Audit / Evidence Boundary	Proposal, evidence refs, reviewer ID/role, payload hash, decision reason.
Security Boundary	Reviewer RBAC, sensitive data minimization, queue access logging.
Primary Cost Drivers	Skilled review labor, queue tooling, evidence prep.
Failure Modes	Rubber-stamping, backlog, reviewer fatigue, feedback loss.
Anti-Patterns	Fake human-in-the-loop with no evidence or veto.
Degraded Mode	Pause automated writes; hold items in staging queue.
Adoption and Support	Requires reviewer training and SLA ownership.
Sourcing / Exit	Keep review records exportable.
Maturity Target	Level 4.

16. Evaluation and Shadow-Mode Pattern

Field	Specification
Problem Class	Candidate model, prompt, route, retrieval, or tool validation before production exposure.
Best-Fit Use Cases	Model upgrades, prompt migrations, retrieval changes, route comparison, regression detection.
No-Use Conditions	Production data cannot be mirrored safely, no validation metrics exist, or candidate path may create side effects.
User Surface	Release dashboard, eval report, regression matrix.
Architecture Shape	Production sample or mirrored request → candidate route in isolated mode → evaluator → regression detector → release decision.
Required Contracts	Eval, route, deployment, observability, data handling, evidence, security.
Human Authority	Release owner approves rollout based on evidence.
Model Route Strategy	Candidate route isolated from production side effects.
Retrieval / Context Strategy	Mirrors retrieval where allowed; otherwise uses representative offline dataset.
Tool / Action Strategy	Read-only or mocked writes; no candidate side effects.
Core Evals	Quality delta, regression rate, safety, latency, cost, grounding, tool-call validity.
Telemetry	Candidate output, eval score, cost/latency delta, data class, route IDs.
Audit / Evidence Boundary	Eval report, dataset/sample version, manifest, approval decision.
Security Boundary	Shadow data must match production policy; candidate path isolated.
Primary Cost Drivers	Duplicate calls, evaluator cost, trace storage, review time.
Failure Modes	Evaluator bias, shadow leakage, candidate side effects, false confidence.
Anti-Patterns	Production upgrade without shadow/eval evidence.
Degraded Mode	Disable candidate path; production path unaffected.
Adoption and Support	Requires release discipline and eval ownership.
Sourcing / Exit	Enables provider/model migration decisions.
Maturity Target	Level 4–5.

Systemic Integration and Control Surfaces

Patterns become production architectures only when their controls are explicit. The matrices below define required contract packs, evaluation classes, telemetry/evidence boundaries, human authority, security boundaries, and cost drivers.

1. Pattern Control Pack Matrix

Pattern	Mandatory Control Pack
Copilot / Embedded Assistant	Prompt, context, route, resource, user expectation, observability.
Research Agent	Prompt, retrieval, grounding, source authority, freshness, resource, eval, observability.
Support Assistant	Intent, retrieval, grounding, escalation, permission, user expectation, support telemetry.
Document Intelligence	Asset, OCR/layout, schema, evidence coordinate, validation, review queue, audit.
Workflow Automation Agent	Plan, tool, permission, idempotency, resource, action verification, ledger.
Coding Agent	Repo access, sandbox, patch, test, security scan, human review, CI gate.
Analytics Assistant	Semantic metric, schema, permission, query validation, read-only execution, provenance.
Multimodal Review	Asset, modality, coordinate/timecode, detection label, expert review, audit.
Enterprise Knowledge System	Ingestion, ACL, retrieval, freshness, grounding, citation, lifecycle, eval.
Decision-Support Cockpit	Case assembly, evidence, scenario, risk, human decision, rationale, audit.
Background Classifier / Router	Event schema, taxonomy, confidence threshold, exception queue, route validation.
Productivity Assistant	Prompt, local/team context, memory if active, resource, user expectation.
Governed Agentic Workflow	Graph, tool, permission, memory, resource, checkpoint, action verification, audit.
AI Gateway / Control Plane	Route, vendor, permission, resource, deployment, observability, eval, policy.
Human Review Queue	Escalation, evidence, reviewer role, override, audit, downstream validation.
Evaluation / Shadow Mode	Eval, route, sample/data policy, candidate isolation, deployment, evidence.

2. Eval-by-Pattern Matrix

Pattern	Core Evaluation Classes
Copilot	Suggestion usefulness, edit distance, latency, acceptance quality, downstream correctness.
Research Agent	Citation fidelity, claim support, source authority, contradiction handling, synthesis usefulness.
Support Assistant	True resolution, repeat contact, policy compliance, escalation quality, CSAT/sentiment.
Document Intelligence	Field precision/recall, schema validity, table extraction, coordinate grounding, correction rate.
Workflow Automation Agent	Plan validity, tool-call validity, postcondition success, rollback/compensation, loop budget.
Coding Agent	Compile/test pass, static scan, diff minimality, reviewer approval, regression rate.
Analytics Assistant	SQL/query validity, metric correctness, row-level security, chart consistency, explanation fidelity.
Multimodal Review	Detection precision/recall, coordinate/timecode grounding, transcript accuracy, reviewer agreement.
Enterprise Knowledge System	Context precision/recall, faithfulness, citation support, permission safety, freshness.
Decision-Support Cockpit	Evidence completeness, risk coverage, option relevance, calibration, rationale quality.
Background Classifier / Router	Precision/recall, confusion matrix, false-negative cost, drift, exception rate.
Productivity Assistant	Utility, memory relevance, safety policy, latency, user correction.
Governed Agentic Workflow	Node transition correctness, graph safety, checkpoint replay, tool verification, budget compliance.
AI Gateway	Route correctness, policy enforcement, latency overhead, cost control, failover safety.
Human Review Queue	Reviewer agreement, queue latency, fatigue, false negatives, downstream correction.
Evaluation / Shadow Mode	Candidate delta, regression, cost/latency delta, evaluator reliability, release decision quality.

3. Telemetry and Evidence Boundary Matrix

Pattern	Operational Telemetry	Evidence / Audit Boundary
Copilot	Accept/reject/edit, latency, route, cost bucket.	Usually none beyond hashes/versions unless regulated.
Research Agent	Plan, source IDs, verifier status, loop count, cost.	Source manifest, claim-evidence map, final report version.
Support Assistant	Intent, source IDs, escalation reason, resolution, repeat contact.	Ticket record, final message, policy version, handoff packet.
Document Intelligence	Field confidence, validation errors, correction, runtime.	Document ref, extracted fields, coordinate refs, reviewer decision.
Workflow Automation Agent	Plan graph, tool calls, postconditions, approvals, breaches.	Execution ledger, payload hashes, approvals, state confirmation.
Coding Agent	Diff, build/test, scan, reviewer edits.	PR, commit hash, test report, reviewer approval.
Analytics Assistant	Query hash, metric IDs, validation, cost, corrections.	Query, metric definition, result ref, chart spec.
Multimodal Review	Media hash, sampling, detections, reviewer edits.	Media ref, annotation IDs, coordinates/timecodes, final decision.
Enterprise Knowledge System	Query, retrieval rank, source IDs, citation verification.	Answer version, source refs, access decision, verifier status.
Decision-Support Cockpit	Evidence refs, options, choice, rationale, overrides.	Decision record, evidence packet, rationale, policy version.
Background Classifier	Event hash, class, confidence, route, exception.	Event ref, taxonomy version, routing decision.
Productivity Assistant	Minimal utility/correction/latency signals.	Usually none unless enterprise risk requires.
Governed Agentic Workflow	Graph state, node events, tool calls, resource use, breaches.	Checkpoints, payload hashes, approvals, final confirmation.
AI Gateway	Route, tokens, cost, latency, policy decision, errors.	Route manifest, policy version, vendor/SLA event, incident record.
Human Review Queue	Queue age, reviewer action, correction, downstream result.	Proposal, evidence refs, reviewer decision, payload hash.
Evaluation / Shadow Mode	Candidate output, eval score, cost/latency delta.	Eval report, sample version, manifest, release approval.

Telemetry supports operation. Evidence supports proof. Do not turn raw telemetry into an uncontrolled audit landfill.

4. Human Authority Matrix

Pattern	Human Authority Level
Copilot	Human accepts, edits, or rejects suggestions.
Research Agent	Human directs research, audits sources, approves synthesis.
Support Assistant	Human handles escalations and material account actions.
Document Intelligence	Human reviews low-confidence/high-impact fields.
Workflow Automation Agent	Human approves plans and high-impact steps.
Coding Agent	Human reviews PR and controls merge.
Analytics Assistant	Human validates metric interpretation and query output.
Multimodal Review	Expert confirms detections and final assessment.
Enterprise Knowledge System	Human verifies cited answers and content owners maintain sources.
Decision-Support Cockpit	Human is final decision-maker.
Background Classifier	Human audits exceptions and taxonomy drift.
Productivity Assistant	User owns final use of generated text.
Governed Agentic Workflow	Human can halt, approve, rollback, or escalate graph execution.
AI Gateway	Platform owner controls route, policy, budget, and shutdown.
Human Review Queue	Reviewer has veto/correction authority.
Evaluation / Shadow Mode	Release owner approves production promotion.

5. Security Boundary Matrix

Pattern	Boundary Requirement
Copilot	Workspace context scoping, secret masking, local/cloud data policy.
Research Agent	Read-only tools, external-query controls, sandboxed browsing/parsing.
Support Assistant	Tenant isolation, PII redaction, CRM permission scope, escalation path.
Document Intelligence	File sandboxing, malware/PDF safety, PII controls, staging writes.
Workflow Automation Agent	Least-privilege tools, idempotency, postcondition checks, no admin identity.
Coding Agent	Network-restricted sandbox, secret masking, CI gate, human merge.
Analytics Assistant	Read-only credentials, row-level security, query limits, semantic layer.
Multimodal Review	Media sandbox, codec protection, PII blur/redaction where required.
Enterprise Knowledge System	Chunk-level ACL/RLS, tenant isolation, source lifecycle governance.
Decision-Support Cockpit	Regulated workspace, strict access, evidence retention policy.
Background Classifier	Input sanitization, queue permission, exception path.
Productivity Assistant	Local file and memory controls, no silent enterprise data upload.
Governed Agentic Workflow	Isolated node execution, scoped credentials, checkpointed state.
AI Gateway	Key vault, route policy, DLP/egress, tenant separation, gateway HA.
Human Review Queue	Reviewer RBAC, sensitive-data minimization, audit access controls.
Evaluation / Shadow Mode	Candidate isolation, no side effects, mirrored-data policy.

6. Cost Driver and Mitigation Matrix

Pattern	Primary Cost Driver	Structural Mitigation
Copilot	High-frequency completions.	Debounce, local cache, short context, small route.
Research Agent	Search/synthesis loops.	Step budget, plan approval, source cache, bounded search.
Support Assistant	Long conversations and escalations.	Summarization, intent routing, KB quality, escalation thresholds.
Document Intelligence	OCR/visual processing and review labor.	Page filtering, document classification, field-confidence gates.
Workflow Automation Agent	Planning/tool loops and retries.	Static workflow templates, step caps, idempotency, approval gates.
Coding Agent	Build/test/retry loops.	Changed-file builds, sandbox reuse, issue scoping.
Analytics Assistant	Warehouse compute and bad joins.	Metric layer, query limits, dry-run cost estimates.
Multimodal Review	Media processing and expert review.	Sampling policy, preprocessing, exception thresholds.
Enterprise Knowledge System	Ingestion/index/reranking.	Deduplication, lifecycle cleanup, hybrid retrieval tuning.
Decision-Support Cockpit	Long reasoning and expert review.	Evidence templates, deterministic calculators, scenario caps.
Background Classifier	Event volume and exceptions.	Fast classifier, batching, taxonomy quality.
Productivity Assistant	Frequent low-value calls.	Local route, caching, memory limits.
Governed Agentic Workflow	Multi-agent loops and checkpoints.	Graph constraints, budget caps, early human intervention.
AI Gateway	Proxy scale, telemetry, cache, provider calls.	Sampling, budget enforcement, route optimization.
Human Review Queue	Reviewer labor.	Better thresholds, prioritization, evidence UI, workflow redesign.
Evaluation / Shadow Mode	Duplicate inference and eval cost.	Sampling, offline evals, candidate gating.

Degraded Mode Pattern Library

Degraded modes are safe lower-capability states. They must preserve the system’s safety contracts even when capability, provider availability, retrieval, tools, or latency degrade.

Pattern	Trigger	Safe Degraded Behavior	Disabled Capability	Disclosure / Recovery
Copilot / Embedded Assistant	Model latency, provider outage, context unsafe.	Static snippets, deterministic autocomplete, local templates.	Generative suggestions.	Show reduced-assist status; restore after route health and policy pass.
Research Agent	Search loop budget, source verifier failure, provider outage.	Present source list, notes, and unresolved questions without synthesis.	Final synthesized answer.	Show unsupported/partial status; recover after evidence verification.
Support Assistant	Low confidence, policy retrieval failure, customer distress, outage.	Route to human queue with conversation summary and known context.	Bot resolution and transactional actions.	Tell user escalation occurred; recover after support route healthy.
Document Intelligence	OCR/parser failure, schema failure, low confidence, malware flag.	Manual document review queue.	Automated extraction/write-back.	Mark document needs review; recover after parser/eval pass.
Workflow Automation Agent	Permission failure, loop budget, unknown final state, API outage.	Freeze workflow, serialize state, notify operator.	Further side effects.	Show paused state and required human action.
Coding Agent	Sandbox failure, test failure, security scan failure.	Suggestion-only analysis with no PR/write authority.	Automated patch/PR creation.	Show failed gate; recover after tests/sandbox pass.
Analytics Assistant	Query validator failure, warehouse limit, metric ambiguity.	Static dashboards, metric glossary, or dry-run query only.	Ad-hoc execution.	Explain query blocked; recover after metric/query validation.
Multimodal Review System	Media parser failure, detection uncertainty, coordinate verifier failure.	Raw media review with manual checklist.	Automated annotations/final assessment.	Mark automation unavailable; recover after media/eval pass.
Enterprise Knowledge System	Retrieval permission uncertainty, index outage, citation verifier failure.	Keyword search/document list only.	Generated answers/summaries.	Show synthesis disabled; recover after retrieval/grounding healthy.
Decision-Support Cockpit	Evidence conflict, missing source, high uncertainty, route failure.	Static checklist and raw evidence packet.	Scenario synthesis/recommendation.	Require human decision; recover after evidence sufficiency.
Background Classifier / Router	Classifier drift, low confidence, queue route failure.	Manual triage queue or deterministic rules.	Automated routing.	Alert queue owner; recover after taxonomy/eval pass.
Personal / Team Productivity Assistant	Memory unsafe, context too sensitive, local route unavailable.	Standard editor/search without AI.	Memory use and generation.	Show AI unavailable/restricted; recover after policy/route pass.
Governed Agentic Workflow	Graph deadlock, contract breach, tool failure, resource budget.	Pause graph and checkpoint state.	Node advancement and side effects.	Notify process supervisor; recover by manual resume or rollback.
AI Gateway / Control Plane	Provider outage, policy failure, key compromise, routing anomaly.	Fail closed or use only approved fallback routes preserving contracts.	Unsafe provider routes and direct bypass.	Alert platform; recover after route/policy health check.
Human Review Queue	Reviewer overload, queue tooling failure, evidence missing.	Hold items in staging; pause downstream writes.	Approval/commit.	Notify operations; recover after queue capacity/evidence restored.
Evaluation / Shadow Mode	Shadow cost spike, candidate failure, data policy violation.	Disable candidate path; production path remains isolated.	Candidate evaluation traffic.	Alert release owner; recover after shadow policy/eval fix.

A degraded mode is not “use a weaker model and hope.” It is a controlled reduction in capability that preserves safety, authority, and evidence boundaries.

AI Architecture Anti-Pattern Catalog

Chatbox on Everything

Symptoms: A single, open-ended conversational chat input area is deployed as the primary user interface, replacing structured web forms, nested menus, action tables, and application buttons.
Why Tempting: Fast to build and deploy; simplifies UI design; shifts the burden of navigating process complexity directly to the user’s phrasing.
Why It Fails: Increases cognitive load on users, who must guess which inputs are valid; leads to high task friction; and yields highly inconsistent completions where deterministic data writes are required.1
Detection Signals: Low user engagement metrics; high frequencies of multi-turn user explanation queries; user feedback complaining about lost dashboard features.
Safer Alternatives: Use the Copilot pattern, embedding task-specific context suggestions inline within structured user interfaces with explicit, validated operation buttons.13

RAG-as-Database

Symptoms: A vector database and Retrieval-Augmented Generation (RAG) pipeline are utilized to perform precise, single-value transactional lookups (e.g., account balances, order status tracking, inventory quantities).
Why Tempting: Avoids the engineering effort of building relational database indexes or structured REST APIs, allowing developers to query un-parsed files directly via semantic similarity.
Why It Fails: Vector similarity models do not guarantee exact factual retrievals; models frequently miss target numbers or synthesize adjacent data fields.7
Detection Signals: Numeric errors in customer responses; low context precision metrics; database retrieval mismatches.
Safer Alternatives: Use deterministic lookup paths (SQL indices, REST endpoints) for structured data queries, and reserve RAG patterns strictly for unstructured document search.

Agent as Intern with Admin Rights

Symptoms: An autonomous, model-driven agent is deployed to write directly to enterprise production systems of record (e.g., modifying ERPs, updating cloud server states) without sandbox boundaries or human approval gates.1
Why Tempting: Promises high automation speeds and creates the appearance of a fully automated operational process.
Why It Fails: Non-deterministic planner behaviors, tool call hallucinations, or prompt injections can trigger unauthorized database deletions, infinite transaction loops, and data corruption.1
Detection Signals: Corrupt production databases; billing anomalies; security alerts indicating unauthorized administrative API executions.
Safer Alternatives: Deploy the Governed Agentic Workflow pattern with sandboxed transaction runtimes and strict human verification gates.1

Demo Architecture

Symptoms: A prototype pipeline using complex multi-agent frameworks is deployed to production with the same configurations used during local development, lacking telemetry logging, error routing, and tracing configurations.2
Why Tempting: Accelerates initial deployment schedules; minimizes system setup and platform management workloads.
Why It Fails: Production data exposes systemic anomalies, rate-limiting failures, token budget overruns, and security exploits that local testing environments hide.2
Detection Signals: Uncontrollable API spend; system timeout errors; untraceable transactional failures; lack of structured logging.
Safer Alternatives: Implement the AI Gateway pattern to manage credentials, rate-limiting parameters, and open telemetry collections across providers.

Citation Theater

Symptoms: Generated documents feature academic-style footnotes and hyperlink annotations that point to irrelevant sources, hallucinated files, or non-existent document subsections.1
Why Tempting: Footnotes look professional and create the appearance of factual grounding and research reliability.
Why It Fails: Undermines system trust; users can be misled by plausible-sounding synthetic text backed by hallucinated references.1
Detection Signals: User feedback flagging broken links; low citation fidelity evaluation scores; automated grounding mismatches.
Safer Alternatives: Deploy a dedicated Citation Verifier pipeline to validate visual coordinate maps and link offsets before rendering documents.

Workflow Double-Work

Symptoms: Users spend more time reviewing, correcting, and re-verifying model-generated drafts than they would have spent writing the document manually.
Why Tempting: Promises high-volume automated document drafts at low upfront cost.
Why It Fails: High-volume, low-quality generations introduce long correction workloads, causing operator fatigue and reducing productivity.
Detection Signals: High user reject rates; low adoption metrics; longer document processing timelines.
Safer Alternatives: Deploy bounded Copilot models designed to suggest local completions inline, rather than generating entire complex reports at once.13

Model Leaderboard Architecture

Symptoms: Selecting a core model backend based purely on generic public benchmarks (e.g., MMLU scores) rather than evaluating performance against custom task data.
Why Tempting: Avoids the operational cost of building local evaluation test suites and benchmark datasets.
Why It Fails: Public benchmarks rarely reflect specific corporate task parameters, schema requirements, or document formatting realities.
Detection Signals: Upgraded models underperforming on custom corporate tasks despite higher public benchmark scores.
Safer Alternatives: Build localized evaluation datasets and regression gates based on production telemetry to validate model upgrades.

Single-Provider Hardwire

Symptoms: Directly referencing vendor-specific SDK libraries and proprietary API payload formats throughout system controller files, without decoupling layers.
Why Tempting: Simplifies initial integration; leverages vendor-specific helper utilities.
Why It Fails: Locks the enterprise into a single model provider; introduces operational risk during outages or API deprecations.6
Detection Signals: Long refactoring timelines during model migration projects; systemic outages when a primary model provider experiences downtime.
Safer Alternatives: Implement the AI Gateway pattern using standard API specifications to decouple application controllers from provider endpoints.9

Unbounded Research Goblin

Symptoms: Deploying research agents without step execution boundaries, model timeout constraints, or spending ceilings, allowing models to query APIs indefinitely in search of answers.1
Why Tempting: Promises deep, thorough research by allowing agents to search iteratively.1
Why It Fails: Stochastic loops can trigger hundreds of search queries and model calls, causing token costs to spike rapidly.1
Detection Signals: Cost alerts; model requests timing out; execution trace logs showing deep, circular reasoning trees.
Safer Alternatives: Implement strict execution budgets, step limits, and timeout boundaries in agent planner configurations.1

Magic Document Reader

Symptoms: Multi-page, visually complex document folders (e.g., scanned PDFs with tables) are sent directly to models as raw file streams, relying on model context to interpret visual and spatial data.
Why Tempting: Avoids configuring specialized layout extraction tools and OCR parsers.
Why It Fails: Standard models frequently misinterpret multi-column tables, omit visual data, or confuse page-level spatial hierarchies.2
Detection Signals: Low extraction precision scores; misread data fields; hallucinated table summaries.
Safer Alternatives: Implement a structured Document Intelligence Pipeline with layout parsing and spatial coordinate validation.2

One Pattern to Rule Them All

Symptoms: Attempting to use a single architectural pattern (e.g., an open-ended conversational agent) to handle all enterprise AI workloads.
Why Tempting: Standardizes development on a single framework, simplifying platform team operations.
Why It Fails: Different tasks have divergent constraints; forcing low-latency tasks through agent loops or highly structured extractions through chat interfaces degrades quality.
Detection Signals: Rising execution latency; user friction; high development costs as teams patch a monolithic system.
Safer Alternatives: Select specialized reference patterns matched to the constraints of each task class.14

Gateway Bypass

Symptoms: Product development teams bypass central AI Gateways and deploy custom model endpoints directly using local cloud credentials.
Why Tempting: Allows developers to prototype and deploy changes without coordination bottlenecks with platform engineering teams.
Why It Fails: Bypasses corporate security, access controls, token spending audits, and threat detection systems, introducing compliance risks.
Detection Signals: Hidden API cost increases on cloud billing; security audits finding unmonitored external model traffic.
Safer Alternatives: Implement the AI Gateway pattern as a mandatory routing policy at the network boundary.

Fake Human-in-the-Loop

Symptoms: Human review interfaces are designed as simple confirmation panels with no access to grounding data or verification tools, encouraging rapid clicking.
Why Tempting: Minimizes workflow delay and reduces human labor costs while claiming safety compliance.
Why It Fails: Humans cannot verify claims without grounding data, leading to automated rubber-stamping and un-vetted errors reaching production.13
Detection Signals: 100% manual review approval rates; downstream error occurrences matching automated prototype testing rates.
Safer Alternatives: Implement a dedicated Human Review and Escalation Queue with side-by-side evidence, coordinate highlighting, and rejection tools.

Deflection Theater

Symptoms: Customer support systems are configured to deflect user requests at all costs, blocking access to human agents or escalation queues.
Why Tempting: Minimizes customer support center staffing requirements and operational overhead.
Why It Fails: Traps users in unresolved loops with incorrect answers, leading to customer churn and brand damage.
Detection Signals: High customer exit rates; negative post-chat feedback ratings; rising repeat-contact frequencies.
Safer Alternatives: Support Assistant pattern with structured intent classification and low-friction human escalation paths.

Metric Hallucination

Symptoms: Models generate business intelligence analytics directly against database tables without using standard, governed semantic metric layer definitions.
Why Tempting: Fast to deploy; avoids the engineering effort of building metadata frameworks or semantic layers.
Why It Fails: Models hallucinate database join paths, use non-standard calculation logic, or expose restricted records to unauthorized users.
Detection Signals: Inconsistent business metrics across reports; SQL compilation failures; security policy alerts.
Safer Alternatives: Implement the Analytics Assistant pattern with strict SQL validators and a governed semantic metric layer.

Vector Dump Knowledge System

Symptoms: Thousands of un-curated, multi-format documents are uploaded into a single vector index without metadata, access rules, or lifecycle curation.8
Why Tempting: Extremely low configuration cost; provides generic conversational answers across the corpus.
Why It Fails: Returns outdated, duplicate, or conflicting context chunks, causing model synthesis confusion and access control leaks.
Detection Signals: Stale information in answers; high rate of irrelevant citations; unauthorized access to sensitive files.
Safer Alternatives: Enterprise Knowledge System pattern with ingestion filters, dynamic chunking, and IAM integration.8

Auto-Merge Coding Agent

Symptoms: Automated coding tools are configured to commit code patches directly to production branches without developer review or verification testing.1
Why Tempting: Accelerates feature delivery; minimizes developer review overhead.
Why It Fails: Plausible but broken patches, logic errors, or security vulnerabilities are deployed directly to production systems.1
Detection Signals: Rising production regressions; build breakages; automated security alert spikes.
Safer Alternatives: Enforce sandbox testing and manual peer-developer code reviews for all model-generated patches.

Fallback Contract Downgrade

Symptoms: When a primary, secure model provider fails, the system automatically redirects traffic to a simpler backup model while bypassing security, output validation, or policy contracts.
Why Tempting: Prioritizes system availability during primary provider outages.
Why It Fails: Lowers system security, allows unsafe outputs to bypass filters, and risks deploying unvalidated data.
Detection Signals: Security anomalies during primary provider outages; format validation failures on backup routes.
Safer Alternatives: Fallback configurations must enforce identical security, validation, and policy contracts across all model routes.

No-Use Condition Matrix

No-use conditions prevent pattern overreach. They do not always mean “never use AI”; they often mean “use another pattern,” “add a human gate,” “use deterministic software,” or “complete readiness work first.”

Pattern	No-Use / Redesign Condition	Why This Pattern Fails	Safer Architecture
Copilot / Embedded Assistant	Task requires autonomous execution, exact calculation, or unattended batch processing.	Inline suggestion surface does not provide execution assurance.	Deterministic workflow, analytics assistant, or workflow automation with gates.
Research Agent	User needs exact source-of-record lookup or answer must be instant and deterministic.	Multi-hop synthesis adds latency and hallucination risk.	SQL/API lookup, enterprise search, or deterministic report.
Support Assistant	Emergency, legal, medical, abuse, or high-emotion dispute without human escalation.	AI may delay proper human intervention.	Human-first support queue with AI evidence assist only.
Document Intelligence Pipeline	Source system already provides clean structured data.	OCR/extraction adds unnecessary error and cost.	API ingestion or deterministic parser.
Workflow Automation Agent	Irreversible or high-value mutation lacks approval, idempotency, or verification.	Side effects can be wrong and unrecoverable.	Human approval queue plus deterministic execution.
Coding Agent	Codebase cannot be built, tested, sandboxed, or reviewed.	No reliable validation path.	Human developer workflow; build test infrastructure first.
Analytics Assistant	Metrics are undefined, database access is broad, or output is regulated filing.	Model may invent joins/calculations.	Semantic metric layer, governed BI, deterministic reporting.
Multimodal Review System	Safety-critical diagnosis or measurement lacks regulated validation and expert authority.	False negatives/positives can cause harm.	Expert-led review with AI as evidence-prep only.
Enterprise Knowledge System	Corpus lacks ACLs, freshness, ownership, or lifecycle governance.	Retrieval can leak data or surface stale/conflicting answers.	Corpus engineering and permission indexing first.
Decision-Support Cockpit	Organization wants AI to make final high-impact decisions.	Cockpit pattern supports humans; it does not replace authority.	Manual decision workflow with evidence support.
Background Classifier / Router	False negatives have high consequence and no exception review exists.	Silent misrouting becomes systemic risk.	Manual triage or deterministic rules until eval/review exists.
Productivity Assistant	System manipulates records, sends customer messages, or processes regulated data.	Local assistant lacks governance and action controls.	Support assistant, workflow automation, or governed enterprise system.
Governed Agentic Workflow	Workflow is simple, linear, deterministic, or latency-critical.	Agent graph adds unnecessary complexity and cost.	Static workflow engine or deterministic microservice.
AI Gateway / Control Plane	One-off local toy prototype with no enterprise data or shared use.	Gateway overhead may exceed value.	Direct sandbox SDK with test credentials only.
Human Review Queue	Task is low-risk, high-volume, and deterministically validated.	Review creates bottleneck and fatigue.	Automated validation with sampled audit.
Evaluation / Shadow Mode	Production data cannot be mirrored safely or no evaluation metric exists.	Shadow path creates privacy risk or meaningless scores.	Offline eval with sanitized/representative data.

If the no-use condition is true, do not “prompt harder.” Change the architecture.

Implementation Maturity Levels

AI architecture maturity describes how safely and repeatably a pattern can be operated. Maturity is not model quality. It is the presence of contracts, evals, telemetry, security, human authority, operations, support, and lifecycle controls.

Level	Name	Allowed Use	Forbidden Use	Required Controls	Exit Gate to Next Level
0	Demo	Local exploration with synthetic or non-sensitive data.	Production data, customer exposure, system-of-record access, unmanaged secrets.	Sandbox only, test credentials, no live integrations, no claims of reliability.	Define use case, data class, owner, and prototype boundary.
1	Controlled Prototype	Internal prototype with curated data and limited users.	Production actions, broad rollout, sensitive data without approval.	Basic prompt/schema, sandbox, initial evals, test keys/vaulted secrets, trace capture.	Pass seed eval, security/data review, and product-fit review.
2	Pilot	Bounded real workflow with limited users and explicit monitoring.	Autonomous high-impact actions, unreviewed external outputs.	Named owner, telemetry, human review path, fallback, incident contact, pilot success criteria.	Demonstrate quality, adoption, cost, safety, and operational readiness.
3	Production	Approved production workflow within defined risk tier.	Unbounded route changes, silent model swaps, missing rollback.	Contract stack, eval gate, deployment manifest, observability, runbook, rollback, support.	Prove repeatability across teams/routes and governance review.
4	Governed Scale	Multi-team or high-volume production under platform controls.	Pattern-specific one-off exceptions without review.	Gateway/control plane, standardized evals, policy automation, cost controls, audit evidence, adoption support.	Package as reusable golden path.
5	Platform Golden Path	Reusable pattern available through developer platform templates.	Bypassing mandatory controls.	Automated scaffolding, contract templates, eval harness, observability, security defaults, documentation, support model.	Continuous lifecycle; no higher maturity required.

Maturity Rules

Rule	Meaning
No production without owner.	Every deployed pattern needs accountable product, technical, and operational ownership.
No write authority before action verification.	Tools that mutate state require permission, idempotency, and postcondition checks.
No broad rollout without telemetry.	Adoption, quality, cost, latency, and breach events must be visible.
No model migration without eval evidence.	Provider/model/prompt/schema changes are deployment events.
No human review theater.	Reviewers need evidence, veto power, and time.
No prototype secrets sprawl.	Even demos use test credentials and safe environments.
No golden path without exit path.	Reusable patterns must include sourcing and migration assumptions.

Maturity is not achieved when the demo works. Maturity begins when failure is boring, bounded, and recoverable.

Cross-Canon Handoff Map

AI-ENG-AJ converts the AI Engineering Systems Canon into reusable reference architectures. Earlier reports define the doctrine, constraints, controls, and failure modes. AJ packages them into pattern cards that teams can select, instantiate, evaluate, operate, and govern.

Canon Report	Input to AJ	How AJ Uses It
AI-ENG-AF — Product Architecture	Use-case fit, workflow value, no-AI decisions, product surface.	Determines whether a pattern should be selected at all.
AI-ENG-AG — Adoption Systems	Training, support, feedback loops, user resistance, incentives.	Adds adoption and support requirements to pattern cards.
AI-ENG-AH — Sourcing and Vendor Strategy	Build/buy/open/hybrid, vendor exit, control plane.	Adds sourcing and exit fields to every pattern.
AI-ENG-AI — Contract Thinking	Prompt, schema, retrieval, tool, permission, route, eval, observability contracts.	Defines the contract surfaces required by each pattern.
AI-ENG-AD — Governance Architecture	Policy, audit, accountability, procurement, compliance.	Defines governance and evidence requirements.
AI-ENG-AE — Sustainable AI	Cost, energy, routing, lifecycle impact.	Adds cost drivers and degraded/resource-aware modes.
AI-ENG-J — Throughput Mechanics	Latency, batching, streaming, queueing.	Shapes route strategy and user-surface expectations.
AI-ENG-K — Weight Dynamics	Model size, quantization, adaptation.	Informs route class and self-host/open-weight viability.
AI-ENG-L — Serving Architecture	Gateways, serving topologies, fallback.	Implements patterns through route/control-plane designs.
AI-ENG-M — Agentic Orchestration	Planners, executors, graph workflows, multi-agent systems.	Grounds agentic and workflow automation patterns.
AI-ENG-N — Tool Contracts	Tool schemas, idempotency, side effects.	Defines tool/action requirements for action-oriented patterns.
AI-ENG-O — Action Verification	Source-of-record confirmation and false-success prevention.	Defines postcondition verification for tools and workflows.
AI-ENG-P — Multimodal Understanding	Vision/audio/video evidence and uncertainty.	Grounds document and multimodal review patterns.
AI-ENG-Q — Speech and Realtime Systems	Streaming, voice latency, turn-taking.	Informs interactive/embedded and support surfaces.
AI-ENG-R — UI Agents	Browser/UI control and interface state.	Informs UI-agent authority and action boundaries.
AI-ENG-S — Production Pathologies	Common production failures.	Feeds anti-patterns, failure modes, and degraded modes.
AI-ENG-T — Boundary Defense	Prompt injection, tenant isolation, egress policy.	Defines security boundaries for every pattern.
AI-ENG-U — Supply Chain Security	SBOM, AI-BOM, provenance, dependency risk.	Informs sourcing, deployment, and platform patterns.
AI-ENG-V — Resource Abuse	Loop abuse, denial-of-wallet, budget failures.	Defines resource controls and cost drivers.
AI-ENG-W — UX Resilience	Fallback, degraded UX, continuity.	Defines degraded-mode library.
AI-ENG-X — User Trust	Trust calibration, contestability, transparency.	Defines user expectation and human-review surfaces.
AI-ENG-Y — Human Review	Reviewer authority, queues, maker-checker, fatigue.	Grounds human review and escalation patterns.
AI-ENG-Z — Telemetry	Runtime traces, metrics, correction signals.	Defines telemetry-by-pattern.
AI-ENG-AA — Evaluation	Golden sets, eval gates, regression.	Defines eval-by-pattern.
AI-ENG-AB — Verification Artifacts	Evidence packages, replay, audit proof.	Defines evidence boundaries.
AI-ENG-AC — AI Operations	Incidents, runbooks, rollback, containment.	Defines operational maturity and breach response.

MCP and Tooling Note

Tooling standards should be referenced carefully. MCP uses JSON-RPC messages over stdio and Streamable HTTP; SSE may be used within Streamable HTTP where supported. Architecture cards should describe the tool contract and transport requirements generically, then name MCP or other protocols as implementation options rather than hard dependencies.

Core Handoff Rule

AJ is where doctrine becomes reusable architecture.

A reference architecture is a reusable failure-aware contract bundle.

It tells engineers:
  when to use the pattern,
  when not to use it,
  what contracts are required,
  how it is evaluated,
  what telemetry proves,
  how humans retain authority,
  how it degrades,
  how it exits,
  and how it fails safely.

Works cited

What Are Agentic Design Patterns? 2026 Pattern Catalog

Augment …, accessed June 15, 2026, https://www.augmentcode.com/guides/agentic-design-patterns

Enterprise AI Agents: Agentic Design Patterns Explained - Tungsten Automation, accessed June 15, 2026, https://www.tungstenautomation.com/learn/blog/build-enterprise-grade-ai-agents-agentic-design-patterns
Agent system design patterns Databricks on AWS, accessed June 15, 2026, https://docs.databricks.com/aws/en/agents/agent-system-design-patterns
Reference architecture: The blueprint for safe and scalable autonomy in SRE and DevOps, accessed June 15, 2026, https://www.ilert.com/blog/reference-architecture-for-scalable-autonomy-in-sre-and-devops
Design Patterns for Agentic AI and Multi-Agent Systems - AppsTek Corp, accessed June 15, 2026, https://appstekcorp.com/blog/design-patterns-for-agentic-ai-and-multi-agent-systems/
AI System Design Patterns for 2026: Architecture That Scales, accessed June 15, 2026, https://zenvanriel.com/ai-engineer-blog/ai-system-design-patterns-2026/
RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More - Confident AI, accessed June 15, 2026, https://www.confident-ai.com/blog/rag-evaluation-metrics-answer-relevancy-faithfulness-and-more
Evaluating the Performance of rag Systems: Metrics Guide …, accessed June 15, 2026, https://unstructured.io/insights/rag-evaluation-a-data-pipeline-performance-framework

AI Gateway Architecture: A Guide for Technical Teams

MLflow, accessed June 15, 2026, https://mlflow.org/articles/ai-gateway-architecture-a-guide-for-technical-teams/

AI control plane: the architecture for AI governance and security

Speakeasy, accessed June 15, 2026, https://www.speakeasy.com/resources/ai-control-plane

What Is An AI Gateway? IBM, accessed June 15, 2026, https://www.ibm.com/think/topics/ai-gateway
System Architecture Overview - Envoy AI Gateway, accessed June 15, 2026, https://aigateway.envoyproxy.io/docs/concepts/architecture/system-architecture
Agentic Design Patterns Terezinha Tech Operations (ttoss), accessed June 15, 2026, https://ttoss.dev/docs/ai/agentic-design-patterns
Choosing the Right Agentic Design Pattern: A Decision-Tree Approach, accessed June 15, 2026, https://machinelearningmastery.com/choosing-the-right-agentic-design-pattern-a-decision-tree-approach/
Model Context Protocol architecture patterns for multi-agent AI systems - IBM Developer, accessed June 15, 2026, https://developer.ibm.com/articles/mcp-architecture-patterns-ai-systems/
Model Context Protocol (MCP) explained: A practical technical overview for developers and architects - CodiLime, accessed June 15, 2026, https://codilime.com/blog/model-context-protocol-explained/
Architecture overview - Model Context Protocol, accessed June 15, 2026, https://modelcontextprotocol.io/docs/learn/architecture
What is the Model Context Protocol (MCP)? - Databricks, accessed June 15, 2026, https://www.databricks.com/blog/what-is-model-context-protocol
What is an AI Gateway? The Complete Guide (2026) - Truefoundry, accessed June 15, 2026, https://www.truefoundry.com/blog/ai-gateway
AI Gateway & LLM Gateway: How They Work and What They Miss - Atlan, accessed June 15, 2026, https://atlan.com/know/what-is-ai-gateway-llm-gateway/
How to Evaluate RAG Systems: Metrics, Methods, and What to Measure First - Comet, accessed June 15, 2026, https://www.comet.com/site/blog/rag-evaluation/
RAG Deep Dive Series: Evaluation & Production - Kalvad Blog, accessed June 15, 2026, https://blog.kalvad.com/rag-deep-dive-series-evaluation-production/
AI Gateway Patterns: Cost Control and Reliability at Scale [2026] - Virtido, accessed June 15, 2026, https://virtido.com/blog/ai-gateway-patterns-production-guide
Multi Agent Architecture: Patterns, Use Cases & Production Reality - Truefoundry, accessed June 15, 2026, https://www.truefoundry.com/blog/multi-agent-architecture

Attribution

Part of Stunspot’s Guide to AI Systems — The AI Engineering Systems Canon.

Created by Sam “stunspot” Walker / Collaborative Dynamics.

Repository: https://github.com/Stunspot/stunspots-guide-to-ai-systems
Stunspot: https://stunspot.com
Collaborative Dynamics: https://www.collaborative-dynamics.com
Discord: https://discord.gg/stunspot

Licensed under CC BY 4.0 unless otherwise stated.
Commercial use, resale, paid redistribution, inclusion in commercial training products, and incorporation into paid knowledge-base products are permitted under CC BY 4.0 with appropriate attribution; no separate permission is required.

← Back to Canon Map