Volume 2 — Knowledge, Data, and Corpus Engineering

Where trustworthy external knowledge comes from, how it is shaped, and how it enters the system.

Reports

AI-ENG-D — Corpus Engineering: Data Provenance, Knowledge Hygiene & Source Authority

Establishes the knowledge substrate beneath RAG and memory. Covers ingestion, normalization, deduplication, lineage, metadata, source authority, permission-aware indexing, canonical IDs, alias tables, versioning, retention, redaction, archival policy, and conflict resolution when sources disagree. Treats the corpus as an engineered asset, not a document landfill.

AI-ENG-E — The Retrieval Pipeline: RAG Architecture, Hybrid Search & Semantic Injection

Covers retrieval as a precision delivery system for external context. Includes chunking strategies, embeddings, lexical search, hybrid retrieval, reranking, query rewriting, metadata filtering, freshness checks, citation construction, attribution quality, and context assembly. Distinguishes finding relevant text from injecting useful, current, permission-safe knowledge.

AI-ENG-F — Knowledge Freshness, Conflict Detection & Context Rot Prevention

Focuses on the decay of information over time. Covers stale retrieval, superseded documents, contradictory policies, orphaned facts, version drift, temporal scoping, expiration rules, recency weighting, source precedence, and automated detection of corpus rot before it poisons model behavior.

← Back to Canon Map