Your Second Brain Won't Scale Without QMD

Most tutorials that teach you to build a second brain in Obsidian never warn you about what happens when the vault gets too full: the brain crashes.

Claude Code plus an Obsidian LLM wiki looks perfect on day one. But as the vault fills with blog posts, articles, papers, journals, docs, video scripts, and audio transcripts, Claude Code’s index-based search starts to struggle. Give it six months and you’ll watch search accuracy steadily slide.

QMD, the tool this post is about, is the cleanest way to plug that gap. Once it’s installed, no matter how big your vault gets, Claude Code can still pinpoint the right content fast, give you accurate results, and cut a huge chunk of search token cost.

Three things in this post:

What it is: what a second brain is, plus a ten-minute walkthrough for beginners to set up the Obsidian + Claude Code combo.
Why: why the second brain weakens once documents pile up, and why QMD is currently the best tool to fix it.
How to do it: how to tell when it’s time to add QMD, and a step-by-step install.

1. The Second Brain: Obsidian + Claude Code

If you don’t have a second brain yet, here’s a five-minute build.

Step 1: download and install Claude Code (Codex, OpenClaw, Hermes, OpenCode, Cursor — any of them work). I’ll call all of them Claude Code below.

Step 2: download and install Obsidian.

Step 3: open Obsidian and create a new vault.

Step 4: inside that folder, create a file called Claude.md and paste in Andrej Karpathy’s LLM wiki rules from this Gist.

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Step 5: open Claude Code in that folder and tell it:

Set up this knowledge base for me according to the rules.

It will build out the whole directory structure on its own:

raw/: raw material. Drop every article, conversation, and video transcript straight in here.
wiki/: compiled notes, organized by topic or concept.
index.md: a hand-written entry page that holds your knowledge map.
log.md: a log of every change.

How does the second brain actually work?

It first writes a summary of each article you put into raw, then pulls out the concepts inside it and creates a separate page for each one.
Every person mentioned in the article gets a page. Every tool gets a page. Every method gets a page.
Then it updates index.md and registers all the new pages there. From that point on, you just paste the wiki/index.md path into any AI, and it’s plugged into your knowledge base. Any question gets answered from your vault, not from a random web search.

The killer feature: once you start dumping your own thoughts and daily reflections in, the AI starts to actually get you. Ask it to write something and it knows your voice. Ask it to weigh in on a decision and it knows your past. The longer you use it, the better it gets.

And because lookups are tiered — the AI reads the index first to find relevant pages, then reads the actual content — it doesn’t have to swallow every note at once. Big token savings.

After that, anything you come across — good articles, your own journals, half-baked ideas, voice memos, whatever — just drop it in raw. Say “ingest this for me,” and the wiki grows by another 5 to 10 pages.

2. The wiki + index Setup Has Its Own Ceiling

But this setup has a ceiling. Three specific failure modes.

Failure 1: as the index gets longer, pinpointing accuracy drops.

Once the wiki system is running smoothly, wiki pages grow steadily. Under Karpathy’s schema, each raw file you ingest usually produces or updates 5 to 10 wiki pages. Run a vault for six months and hitting the hundreds is normal.

At that point index.md itself becomes hundreds of lines long. It’s the first thing the agent reads on every query, so every query loads those hundreds of lines into context. You don’t save tokens, and pinpointing accuracy drops — the more granular your categories, the more the agent gets stuck between seven or eight near-identical concept pages. What used to be a clean “read index → read wiki” two-hop turns into “read index → flip through 3-5 wiki pages, none of them right → go back to the index.”

Failure 2: the index can’t help with cross-category semantic search.

Karpathy’s schema splits the wiki into Concept, Entity, Synthesis, Self-analysis, Comparison, and other types, each filed by topic.

But your real queries usually cross categories. Say you want to dig up “how my mindset on indie hacking shifted” — that question might touch a self-analysis page (your reflection from two years ago), a synthesis page (your overall take on indie hacking), and some fragment buried in raw from a conversation with a friend. The index sorts by “what type of page this is,” not by “what concept shows up where.” Cross-category search is exactly where it falls down.

Claude Code’s built-in grep doesn’t help either, because the literal phrase “mindset shift” might never appear once.

Failure 3: the details inside raw vanish.

This is the sneakiest one, and the most damaging.

Under Karpathy’s SOP, ingesting raw produces synthesis pages in the wiki — concepts, entities, comparisons, self-analyses. That’s a deliberate trade-off: wiki pages only keep the topical skeleton, not the specific details, direct quotes, examples, and numbers from the raw source.

A concrete example. You stuffed a 30-part AI course note series into raw, and ingest squeezed it into 2 wiki pages (a course map and a methods list). Those two pages tell the agent “here’s the overall framework and the core methods of this course,” but the debugging trick from lesson 5, the specific example from lesson 8, the verbatim line from lesson 14 — all of that lives in raw and never made it into the wiki.

Raw might hold hundreds of files, and any one of them could hide some specific sentence you’ll want back later. But ingest flattens them all into wiki summaries, so the original text never enters the agent’s retrieval path. Unless you remember the filename and open raw directly, those details might as well not exist.

The root cause of all three problems isn’t “the wiki model is wrong.” It’s that the wiki model is great at compiled knowledge, but the whole vault still needs a layer of full-text semantic search to cover what didn’t get compiled into the wiki, the cross-category queries, and the cases where the index itself has bloated past the point of usefulness.

And the fix can’t be “let the agent read all of raw” — that would blow tokens through the roof. What you need is a tool that pinpoints precisely, returns only the relevant snippets, and saves tokens on the side.

In 2026, the best tool for this is QMD.

3. RAG Fills the Gap the Wiki Model Leaves

What RAG Is

RAG stands for Retrieval-Augmented Generation.

The whole idea, in one sentence: before you ask the AI a question, first pull the few most relevant passages from your knowledge base, then let the AI answer using those passages.

The LLM isn’t answering out of thin air. It’s answering based on “those passages we just pulled.” The facts, the quotes, the details in its answer all come from your vault.

One caveat: RAG is not a replacement for wiki + index. They solve different problems. Wiki + index handles “compile the things you already know are important into structured knowledge.” RAG handles “find the most relevant scattered snippets across the whole vault for the current query.” One compiles, one searches. They complement each other.

Going Further: Hybrid Retrieval

Vector search finds content that “means the same thing but doesn’t share the words.” Search for “indie hacking mindset shift” and it can pull up a podcast note from your raw two years ago that says “indie hacking ends in boredom” — not one word in common, but the meanings converged.

So is vector search enough on its own?

No. Vector search has its own blind spot: it can’t do exact lookups.

Search for “qmd v2.1.0” and vector search might hand back content for v2.0 or v1.9, because they’re semantically too close. But what you wanted was that exact version number.

This is where BM25 (an exact-match algorithm based on keyword frequency) crushes it.

That’s why the only retrieval setup worth trusting is hybrid retrieval: BM25 for exact matches, vector search for semantics, then a small reranker model that resorts both streams by relevance and picks the best few.

QMD does all of this.

4. Why QMD

Right Architecture, Real Results

QMD’s retrieval pipeline follows the same playbook as Google search and Anthropic’s internal Claude retrieval system.

A query comes in. A fine-tuned small model does query expansion. Every variant runs BM25 and vector search in parallel. All the results merge via RRF, then go through a qwen3-reranker for final reranking, then get position-weighted to surface the most relevant few.

This pipeline has been validated repeatedly in academic IR papers since 2020. QMD just packages it: what used to take tens of thousands of lines of code is now an install.

Fully Local, Free, Simple to Use

This is the fundamental dividing line between personal RAG and enterprise RAG.

Enterprise RAG burns money on two things: hosted vector databases (Pinecone, Weaviate) at hundreds of dollars a month, and embedding API calls (OpenAI text-embedding-3) billed by the token.

QMD takes both of those local.

Alibaba’s the biggest contributor here. Qwen3 is the foundation that lets Chinese-language users run commercial-grade retrieval locally. Without Qwen3, personal RAG for Chinese content would still be a year out.

QMD is a daily-use utility at heart. It should be free, like grep or ripgrep, and as much a piece of infrastructure as your OS or your notes app.

And It Saves You a Lot of Tokens

A hidden upside of QMD: it slashes the token cost of Claude Code’s calls into your second brain.

The mechanism is simple. The old way, the agent uses grep to find things, then reads every hit in full — one Q&A easily burns 20,000 tokens. With qmd installed, the agent queries directly and gets back 3 to 5 of the most relevant snippets, a few hundred tokens each, under 1,000 total.

A real number: Andrew Levine published the data from his 600+ note Obsidian vault — a single query used to burn 15,000 tokens; with qmd, the same query uses 500. 96% savings. He says it took him 5 minutes from download to working setup.

If Claude’s token bill is hurting (especially if you’re paying per-API instead of subscribed), this matters more than free does.

Endorsed by Both Tobi and Karpathy

QMD’s author is Tobias Lütke, founder and CEO of Shopify.

He went on a posting spree in March 2026 about tuning his query-expansion model before bed. The qmd repo on GitHub is now past 25,000 stars.

A CEO running a company worth tens of billions, staying up late to write a personal markdown search tool, tells you one simple thing: he got fed up with the same problem. The notes he keeps hit the same retrieval ceiling yours and mine do, and he decided to fix it himself.

In the “Optional: CLI tools” section of his llm-wiki Gist, Karpathy writes:

“A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option.”

Plain English: when the wiki’s small, the index file is enough; once the wiki grows, you need real search — and qmd is a good pick.

5. Installing QMD: One Prompt Does It

cd into your second-brain folder, fire up Claude Code, paste this in:

Install qmd for me, hook up the current directory for retrieval, my notes are in Chinese.

npm install -g @tobilu/qmd — install the qmd CLI itself

claude plugin marketplace add tobi/qmd && claude plugin install qmd@qmd — install the plugin (auto-registers the MCP server + skill)

qmd collection add . --name brain — register the current folder as the knowledge base

Switch to the Qwen3-Embedding-0.6B multilingual model (mandatory for a Chinese vault)

qmd embed -f — build the vector index

The first run downloads about 2.4GB of local models. Wait a few minutes. After that, every embedding runs locally on your machine — no internet, no API spend.

Why switch to Qwen3: QMD’s default embedding model is optimized for English. For a Chinese vault, switching to Qwen3-Embedding-0.6B is a massive lift in retrieval quality.

No English tutorial mentions this step.

From there, Claude Code routes through qmd automatically when it queries your second brain. You do nothing. When you add notes, just run qmd update once in a while for an incremental reindex.

Closing

Should you install it right now? Karpathy gives the bar in the llm-wiki Gist:

“The index pattern works great at roughly 100 files and a few hundred wiki pages. But as the wiki grows, you want proper search — qmd is a good option.”

Hold your own second brain up against that:

Around a hundred files and a few hundred wiki pages: stick with index for now, revisit qmd later.
Search results are noticeably weaker, or wiki count is approaching a thousand: add qmd as soon as you can. Better search and a smaller bill at the same time.

If you want your second brain to keep working for the long haul, QMD is the single tool most worth your time.

1. The Second Brain: Obsidian + Claude Code#

2. The wiki + index Setup Has Its Own Ceiling#

3. RAG Fills the Gap the Wiki Model Leaves#

What RAG Is#

Going Further: Hybrid Retrieval#

4. Why QMD#

Right Architecture, Real Results#

Fully Local, Free, Simple to Use#

And It Saves You a Lot of Tokens#

Endorsed by Both Tobi and Karpathy#

5. Installing QMD: One Prompt Does It#

Closing#

Related Posts

Comments

Stay Updated