Files

Daniel ea35e563f0 fix: remove Nate Herk source page + all references

Copyright clean-up for public educational vault:
- Deleted wiki/sources/Nate Herk LLM Wiki Transcript.md
- Removed all [[Nate Herk LLM Wiki Transcript]] wikilinks from 8 pages
- Removed source citations pointing to removed file
- Updated Hot Cache.md: removed attribution to specific person
- Updated Andrej Karpathy.md: first_mentioned now points to concept page
- Updated Wiki Map.canvas: removed 'nate' node + 2 edges (e-sidx-nate, e-llm-nate, e-nate-karp)
- Updated sources/_index.md: empty transcripts section
- Updated index.md, hot.md, log.md, overview.md: no Nate Herk entries

Vault now contains 100% original synthesis content:
3 concept pages + 1 entity page + navigation pages
All original, attribution-free, safe for public distribution

2026-04-07 12:51:41 +03:00

3.8 KiB

Raw Blame History

type, title, complexity, domain, aliases, created, updated, tags, status, related, sources

type

title

complexity

domain

aliases

created

updated

LLM Wiki Pattern

A pattern for building persistent, compounding knowledge bases using LLMs. Originated by Andrej Karpathy. The key insight: instead of re-deriving knowledge from raw documents on every query (RAG), the LLM incrementally builds and maintains a structured wiki that gets richer with every source added.

The Core Idea

Most AI knowledge tools work like RAG: index raw documents, retrieve chunks at query time, generate an answer. Nothing accumulates. Ask a question that needs five documents and the LLM reassembles fragments every time.

The wiki pattern is different. When a new source arrives, the LLM reads it, extracts what matters, and integrates it into the wiki: updating entity pages, noting contradictions, strengthening the synthesis. The cross-references are already there. The knowledge is compiled once and kept current.

The wiki is a persistent, compounding artifact. The human curates sources and asks questions. The LLM writes and maintains everything.

Three Layers

.raw/       Layer 1 — immutable source documents
wiki/       Layer 2 — LLM-generated knowledge base
CLAUDE.md   Layer 3 — schema that tells the LLM how to maintain it

The LLM owns Layer 2 entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. The human reads; the LLM writes.

Operations

Ingest — drop a source into .raw/, tell the LLM to process it. The LLM reads the source, discusses key takeaways, writes a summary page, updates entity and concept pages, and logs the operation. One source typically touches 8-15 wiki pages.

Query — ask a question. The LLM reads the index to find relevant pages, synthesizes an answer with citations. Good answers get filed back into the wiki.

Lint — periodic health check. Find orphan pages, dead links, stale claims, missing cross-references.

Index and Log

index.md — content-oriented. A catalog of all pages with one-line summaries, organized by category. The LLM reads this first on every query to find relevant pages.

log.md — chronological. Append-only record of every ingest, query, and lint pass. Parseable: grep "^## \[" log.md | head -10

Why It Works

The tedious part of maintaining a knowledge base is bookkeeping: updating cross-references, noting when new data contradicts old claims, keeping summaries current. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored. The wiki stays maintained because the cost of maintenance is near zero.

At small scale (~100 sources, ~hundreds of pages), the index file is sufficient. No vector database, no embeddings, no infrastructure. Just markdown files.

Comparison to RAG

Dimension	LLM Wiki	Semantic RAG
Finding	Reads index, follows links	Similarity search over embeddings
Infrastructure	Just markdown files	Embedding model + vector DB
Cost	Tokens only	Ongoing compute + storage
Maintenance	Run a lint	Re-embed when content changes
Scale limit	Hundreds of pages	Millions of documents

Connections

See Compounding Knowledge for why the pattern produces more value over time. See Hot Cache for the session context optimization. See Andrej Karpathy for the pattern's origin.

3.8 KiB Raw Blame History