ai, productivity,

Build Your Own AI Knowledge Brain: Karpathy's ALLMwiki + Graphify Tutorial

Cui Cui Follow Apr 16, 2026 · 8 mins read
Build Your Own AI Knowledge Brain: Karpathy's ALLMwiki + Graphify Tutorial
Share this

Andrej Karpathy recently published a framework called ALLMwiki that went viral across the AI and developer communities. At its core, it’s not some complex new technology — it’s a set of concepts and behavioral guidelines for building a personal AI-powered knowledge base that gets smarter over time.

The beautiful part? There’s no deep technical threshold. Anyone can borrow this workflow for their own projects.

This tutorial covers:

  1. The ALLMwiki Framework — the 3 file types, 3 daily operations, and 3 query tools
  2. Building it in Obsidian — a practical walkthrough with real examples
  3. Graphify — an open-source project that supercharges the framework with a proper knowledge graph
  4. Three blind spots — things most people overlook when implementing this

Part 1: The ALLMwiki Framework

The framework boils down to a deceptively simple structure: 3 types of files, 3 daily operations, and 3 query tools.

The 3 Types of Files

1. Raw Resources — your raw materials warehouse. PDFs, articles, video transcripts, papers, anything you collect. These are untouched, original content.

2. Wiki Files — where the magic happens. AI reads your raw resources and extracts entities and concepts:

  • Entities: people, companies, projects, courses
  • Concepts: methodologies, technical terms, frameworks

For each entity or concept, AI creates a dedicated Wiki page and establishes cross-reference relationships between them. These wikis are primarily generated and maintained by AI — you don’t write them yourself.

3. Rule Documents (Schema) — the agreement between you and your AI. It tells the AI: the file structure (how RAW maps to Wiki), the processing workflow, and naming conventions. The schema evolves over time — you refine it with AI as your needs change.

The 3 Daily Operations

Ingest — Feed new materials to AI. Once a raw document arrives, AI reads it according to your schema, extracts entities and concepts, creates or updates Wiki pages, and updates the index and log.

Query — Ask questions based on your generated Wiki. Because wikis are atomized and relational, searching is far more efficient than hunting through original documents. If AI generates a particularly good answer, you can save it as a new wiki page.

Review (Link Maintenance) — Periodically have AI audit your knowledge base: find contradictions, identify outdated statements, surface isolated pages (no bidirectional links), and generate merge suggestions for redundant pages.

The 3 Query Efficiency Tools

Index — AI automatically maintains a single index page listing all Wiki pages with one-sentence summaries. When you ask AI a question, it first browses the index, finds relevant wikis, then dives deeper.

Log — AI records every operation it performs — what raw file it read, what entities/concepts it created, what links it established. This lets you (and the AI) know exactly what happened and when.

RAG (for large knowledge bases) — Once you have 1,000+ wiki pages, index-based browsing becomes slow. Karpathy recommends QMD — a local tool supporting BM25 + vector hybrid search with CLI support, meaning AI can call it directly.


Part 2: Building It in Obsidian

Folder Structure

vault/
├── RAW/                    # Your raw materials (PDFs, transcripts, etc.)
├── Wiki/
│   ├── entities/          # People, companies, projects
│   ├── concepts/          # Technical terms, methodologies
│   ├── overviews/         # High-level summaries
│   ├── comparisons/       # A vs B analyses
│   └── summaries/         # Distilled raw material summaries
├── index.md               # Auto-maintained by AI
├── log.md                 # Operation history
└── schema.md              # Your rules document

Designing Your Schema

Your schema.md is the most important file. A minimal version:

## File Structure
- RAW folder: stores original materials, never modified
- Wiki subfolders: entities/, concepts/, overviews/, comparisons/, summaries/
- Each wiki requires frontmatter: type, source, created, related

## Frontmatter Convention
---
type: concept | entity | overview | comparison | summary
source: [raw file name]
ai_generated: true | false
created: YYYY-MM-DD
---

## Workflows
### Ingest
1. Read RAW file
2. Extract entities and concepts
3. Create wiki pages with bidirectional links
4. Update index.md and append to log.md

### Review
1. Scan all wikis for contradictions
2. Find isolated pages (no bidirectional links)
3. Flag outdated content
4. Suggest merges

What AI Actually Generates

Given two raw documents — an OpenCloud whitepaper (100+ pages) and a YouTube transcript — here’s what AI produced after running ingest:

Concepts created: AI agent, Skill system, Vector memory search — each with linked source documents and extracted code snippets.

Entities created: Founders, authors, projects (Lobster, OpenCloud), organizations — with quotes and project history.

Comparison wiki auto-generated:

Memory Mechanism Comparison: PAI vs. OpenCloud

  • OpenCloud: 4-layer memory mechanism
  • PAI: file system priority memory
  • Horizontal comparison with trade-offs

The relationship graph between pages becomes rich quickly.


Part 3: Graphify — The Knowledge Graph Evolution

While Obsidian + AI gives you a wiki-based knowledge base, graphify takes Karpathy’s framework and converts it into a proper knowledge graph.

What Graphify Does Differently

  Karpathy’s ALLMwiki Graphify
Storage Flat markdown wikis NetworkX knowledge graph
Indexing AI-maintained index file AST + semantic dual channels
Querying AI browses index → wikis Graph traversal
Maintenance Manual schema + AI prompts Algorithmic community discovery

The three leaps graphify provides:

  1. Flat text → relational graph
  2. LLM indexing → AST + semantic dual channels (code parsing is deterministic, no tokens burned)
  3. Manual maintenance → algorithmic discovery

Core Commands

# Build knowledge graph from current directory
/graphify .

# Query the graph
/graphify query "which file contains the BM25 implementation?"

# Explain a component
/graphify explain "how is intelligent extraction implemented?"

# Add external paper to graph
/graphify add https://arxiv.org/abs/XXXX.XXXXX

# Export to Obsidian vault
/graphify export obsidian

Real-World Test Results

Testing graphify on a memory plugin codebase:

Query: “Which file is the BM25 code in?” → Found exact files with specific line numbers and function names.

Query: “Explain how intelligent extraction works” → Drew complete flowchart, listed 6 memory types, showed scoring formula.

After adding an academic paper → Linked 5 paper factors to exact code implementations, traced graph path from paper node → code node.

After merging a PR → Updated graph automatically, explained the PR’s impact (27 lines fixed a data leak and permanent failure).


Part 4: Three Blind Spots to Avoid

1. AI-generated wikis ≠ the best way to learn

If you’re starting from scratch on a topic, read the raw materials first. A structured PDF tutorial is designed for learning — it builds concepts progressively. The AI wikis are better for review and synthesis after you’ve done the foundational work.

2. You must inspect what AI generates

Don’t hoard mindlessly. AI-generated wikis look polished but often have gaps:

  • Cross-reference links without context
  • Missing relationship descriptions in “related pages” sections
  • Vague or incomplete entity summaries

Fix the schema iteratively. If links lack context, update your schema: “Every cross-reference link must include a reason: what concept above makes this link relevant?”

Only wikis you’ve reviewed and improved will actually guide your decisions. Otherwise you’re hiring a robot to work out for you — the robot gets fit, not you.

3. You’re writing for AI, not just yourself

In ALLMwiki, much of the content is written for AI to read. The index and log are primarily for AI orientation. Frontmatter fields like type: concept | ai_generated: true help AI distinguish its own work from yours. Design your schema with this in mind — the cleaner your machine-readable structure, the better AI can maintain and grow the knowledge base over time.


10-Minute Quickstart

  1. Create your vault structure (RAW, Wiki subfolders, index, log)
  2. Write a basic schema — 1-2 pages covering file structure + 3 workflows
  3. Drop 2-3 documents into RAW
  4. Run ingest: “Follow the schema, analyze all files in RAW, create corresponding Wikis with bidirectional links”
  5. Review 3-5 generated pages — fix anything that doesn’t look right
  6. Update schema based on what you find
  7. (Optional) Install graphify if you want graph-powered code understanding

Start small. One topic area. Five raw documents. Let the system prove itself before scaling.


Resources

  • Andrej Karpathy’s original ALLMwiki post — search “ALLMwiki Karpathy” on Twitter/X
  • graphify on GitHub — search “graphify knowledge graph”
  • QMD — local hybrid search for large knowledge bases (1,000+ pages)
  • Obsidian — free knowledge management app with bidirectional links

Tutorial based on community walkthroughs of Karpathy’s ALLMwiki framework and the graphify open-source project. Build the AI brain you wish you had.

Join Newsletter
Get the latest news right in your inbox. We never spam!
Cui
Written by Cui Follow
Hi, I am Z, the coder for cuizhanming.com!

Click to load Disqus comments