Andrej Karpathy recently published a framework called ALLMwiki that went viral across the AI and developer communities. At its core, it’s not some complex new technology — it’s a set of concepts and behavioral guidelines for building a personal AI-powered knowledge base that gets smarter over time.
The beautiful part? There’s no deep technical threshold. Anyone can borrow this workflow for their own projects.
This tutorial covers:
- The ALLMwiki Framework — the 3 file types, 3 daily operations, and 3 query tools
- Building it in Obsidian — a practical walkthrough with real examples
- Graphify — an open-source project that supercharges the framework with a proper knowledge graph
- Three blind spots — things most people overlook when implementing this
Part 1: The ALLMwiki Framework
The framework boils down to a deceptively simple structure: 3 types of files, 3 daily operations, and 3 query tools.
The 3 Types of Files
1. Raw Resources — your raw materials warehouse. PDFs, articles, video transcripts, papers, anything you collect. These are untouched, original content.
2. Wiki Files — where the magic happens. AI reads your raw resources and extracts entities and concepts:
- Entities: people, companies, projects, courses
- Concepts: methodologies, technical terms, frameworks
For each entity or concept, AI creates a dedicated Wiki page and establishes cross-reference relationships between them. These wikis are primarily generated and maintained by AI — you don’t write them yourself.
3. Rule Documents (Schema) — the agreement between you and your AI. It tells the AI: the file structure (how RAW maps to Wiki), the processing workflow, and naming conventions. The schema evolves over time — you refine it with AI as your needs change.
The 3 Daily Operations
Ingest — Feed new materials to AI. Once a raw document arrives, AI reads it according to your schema, extracts entities and concepts, creates or updates Wiki pages, and updates the index and log.
Query — Ask questions based on your generated Wiki. Because wikis are atomized and relational, searching is far more efficient than hunting through original documents. If AI generates a particularly good answer, you can save it as a new wiki page.
Review (Link Maintenance) — Periodically have AI audit your knowledge base: find contradictions, identify outdated statements, surface isolated pages (no bidirectional links), and generate merge suggestions for redundant pages.
The 3 Query Efficiency Tools
Index — AI automatically maintains a single index page listing all Wiki pages with one-sentence summaries. When you ask AI a question, it first browses the index, finds relevant wikis, then dives deeper.
Log — AI records every operation it performs — what raw file it read, what entities/concepts it created, what links it established. This lets you (and the AI) know exactly what happened and when.
RAG (for large knowledge bases) — Once you have 1,000+ wiki pages, index-based browsing becomes slow. Karpathy recommends QMD — a local tool supporting BM25 + vector hybrid search with CLI support, meaning AI can call it directly.
Part 2: Building It in Obsidian
Folder Structure
vault/
├── RAW/ # Your raw materials (PDFs, transcripts, etc.)
├── Wiki/
│ ├── entities/ # People, companies, projects
│ ├── concepts/ # Technical terms, methodologies
│ ├── overviews/ # High-level summaries
│ ├── comparisons/ # A vs B analyses
│ └── summaries/ # Distilled raw material summaries
├── index.md # Auto-maintained by AI
├── log.md # Operation history
└── schema.md # Your rules document
Designing Your Schema
Your schema.md is the most important file. A minimal version:
## File Structure
- RAW folder: stores original materials, never modified
- Wiki subfolders: entities/, concepts/, overviews/, comparisons/, summaries/
- Each wiki requires frontmatter: type, source, created, related
## Frontmatter Convention
---
type: concept | entity | overview | comparison | summary
source: [raw file name]
ai_generated: true | false
created: YYYY-MM-DD
---
## Workflows
### Ingest
1. Read RAW file
2. Extract entities and concepts
3. Create wiki pages with bidirectional links
4. Update index.md and append to log.md
### Review
1. Scan all wikis for contradictions
2. Find isolated pages (no bidirectional links)
3. Flag outdated content
4. Suggest merges
What AI Actually Generates
Given two raw documents — an OpenCloud whitepaper (100+ pages) and a YouTube transcript — here’s what AI produced after running ingest:
Concepts created: AI agent, Skill system, Vector memory search — each with linked source documents and extracted code snippets.
Entities created: Founders, authors, projects (Lobster, OpenCloud), organizations — with quotes and project history.
Comparison wiki auto-generated:
Memory Mechanism Comparison: PAI vs. OpenCloud
- OpenCloud: 4-layer memory mechanism
- PAI: file system priority memory
- Horizontal comparison with trade-offs
The relationship graph between pages becomes rich quickly.
Part 3: Graphify — The Knowledge Graph Evolution
While Obsidian + AI gives you a wiki-based knowledge base, graphify takes Karpathy’s framework and converts it into a proper knowledge graph.
What Graphify Does Differently
| Karpathy’s ALLMwiki | Graphify | |
|---|---|---|
| Storage | Flat markdown wikis | NetworkX knowledge graph |
| Indexing | AI-maintained index file | AST + semantic dual channels |
| Querying | AI browses index → wikis | Graph traversal |
| Maintenance | Manual schema + AI prompts | Algorithmic community discovery |
The three leaps graphify provides:
- Flat text → relational graph
- LLM indexing → AST + semantic dual channels (code parsing is deterministic, no tokens burned)
- Manual maintenance → algorithmic discovery
Core Commands
# Build knowledge graph from current directory
/graphify .
# Query the graph
/graphify query "which file contains the BM25 implementation?"
# Explain a component
/graphify explain "how is intelligent extraction implemented?"
# Add external paper to graph
/graphify add https://arxiv.org/abs/XXXX.XXXXX
# Export to Obsidian vault
/graphify export obsidian
Real-World Test Results
Testing graphify on a memory plugin codebase:
Query: “Which file is the BM25 code in?” → Found exact files with specific line numbers and function names.
Query: “Explain how intelligent extraction works” → Drew complete flowchart, listed 6 memory types, showed scoring formula.
After adding an academic paper → Linked 5 paper factors to exact code implementations, traced graph path from paper node → code node.
After merging a PR → Updated graph automatically, explained the PR’s impact (27 lines fixed a data leak and permanent failure).
Part 4: Three Blind Spots to Avoid
1. AI-generated wikis ≠ the best way to learn
If you’re starting from scratch on a topic, read the raw materials first. A structured PDF tutorial is designed for learning — it builds concepts progressively. The AI wikis are better for review and synthesis after you’ve done the foundational work.
2. You must inspect what AI generates
Don’t hoard mindlessly. AI-generated wikis look polished but often have gaps:
- Cross-reference links without context
- Missing relationship descriptions in “related pages” sections
- Vague or incomplete entity summaries
Fix the schema iteratively. If links lack context, update your schema: “Every cross-reference link must include a reason: what concept above makes this link relevant?”
Only wikis you’ve reviewed and improved will actually guide your decisions. Otherwise you’re hiring a robot to work out for you — the robot gets fit, not you.
3. You’re writing for AI, not just yourself
In ALLMwiki, much of the content is written for AI to read. The index and log are primarily for AI orientation. Frontmatter fields like type: concept | ai_generated: true help AI distinguish its own work from yours. Design your schema with this in mind — the cleaner your machine-readable structure, the better AI can maintain and grow the knowledge base over time.
10-Minute Quickstart
- Create your vault structure (RAW, Wiki subfolders, index, log)
- Write a basic schema — 1-2 pages covering file structure + 3 workflows
- Drop 2-3 documents into RAW
- Run ingest: “Follow the schema, analyze all files in RAW, create corresponding Wikis with bidirectional links”
- Review 3-5 generated pages — fix anything that doesn’t look right
- Update schema based on what you find
- (Optional) Install graphify if you want graph-powered code understanding
Start small. One topic area. Five raw documents. Let the system prove itself before scaling.
Resources
- Andrej Karpathy’s original ALLMwiki post — search “ALLMwiki Karpathy” on Twitter/X
- graphify on GitHub — search “graphify knowledge graph”
- QMD — local hybrid search for large knowledge bases (1,000+ pages)
- Obsidian — free knowledge management app with bidirectional links
Tutorial based on community walkthroughs of Karpathy’s ALLMwiki framework and the graphify open-source project. Build the AI brain you wish you had.
Claude Opus 4.7 Is Here: Harder Coding Tasks, Sharper Vision, and Safer Cyber
Click to load Disqus comments