Building bettermemory.
Why I rebuilt persistent memory for Claude Code from the ground up.
Most AI assistants now ship a memory feature. The pitch is appealing. You tell it something once, and it remembers next time. In practice, every implementation I have used makes the same mistake. They take every stored fact and inject it into every prompt, regardless of whether the fact is relevant to what you actually asked.
The result is subtle but corrosive. Ask for a Python tutorial and the answer comes coloured by your home lab notes. Ask a basic shell question and stored preferences from months ago shape the response. Stale facts get dispensed with full confidence. The longer you use the tool, the less reliable any individual answer becomes.
I built bettermemory to fix this. It is a local MCP server (and a Claude Code plugin) that flips the contract. Instead of pushing memory at the model, the model pulls memory only when it actually needs it.
The failure mode
Auto-injection looks fine in demos. You set a preference, ask a related question, and get a contextual answer. After a few weeks of real use, two patterns emerge.
The first is bleed-through. Memory pulled in for an unrelated topic still influences tone, framing, and assumptions. This is not a bug in the model. The model is doing exactly what you would expect when you put irrelevant context into the prompt. The bug is the design choice that puts it there.
The second is silent staleness. A fact you stored in March is still in the prompt in November, even after the project moved on, the file paths changed, or your preferences shifted. The model has no way to know whether what it is being told is current. It treats every memory as fresh, and you only notice when an answer is wrong.
False negatives are recoverable. If the model forgets and asks once, you tell it again. False positives cascade. The model anchors on a stale fact, builds the rest of its reasoning on top, and you never see the moment the conversation went off course.
The opt-in contract
bettermemory exposes memory as a set of MCP tools that the model calls when it needs them. There is no automatic injection. At the start of a session, the model can call memory_scope_overview, which returns counts per scope without any bodies. If the total is zero, the model knows there is nothing to fetch. Otherwise, the model calls memory_search only when a request is ambiguous in a way stored context could resolve.
The separation matters in practice. A request like “walk me through pandas” hits the search tool, because that phrasing is the kind of thing a learning-style preference might shape. A request like “how do I write a Python list comprehension” does not. The reply comes back as clean generic prose, untainted by months of accumulated personal context.
When the model does retrieve memory and use it, the policy requires it to say so out loud. “Using your stored preference for code-driven tutorials…” is the audit trail. You see what shaped the answer.
Three staleness signals
Memory is a snapshot. It does not refresh itself. To stop the model from confidently dispensing stale facts, every retrieval carries three structured signals.
The first is a calendar block. Each memory has a last_verified_at timestamp and a status of never, stale, or fresh against a configurable threshold. When the status is not fresh, the model is expected to spot-check at least one verifiable claim before relying on the body.
The second is path drift. The retrieval reports how many file paths cited in the body still exist on disk. If a memory says “see src/foo.py for the routing logic” and that file is gone, the count of missing paths shows up immediately, without needing a follow-up call.
The third is commit drift. When the caller is in a git repository whose memories live in this store, the retrieval reports how many commits have landed since the last verification. A memory verified three months and two hundred commits ago is technically still fresh on the calendar but should be spot-checked anyway. The commit count surfaces this.
After a successful spot-check, the model calls memory_verify to refresh the timestamp and log what it confirmed. If a claim has drifted, the model fixes it via memory_update first, then verifies the corrected version.
Plain files on disk
There is no database. Every memory is a markdown file with YAML frontmatter, stored in ~/.claude-memory/ (or ./.claude-memory/ for project-scoped, or wherever BETTERMEMORY_DIR points).
---
schema_version: 1
id: 01HXYZ123ABC
created: 2025-03-14T10:23:00+00:00
updated: 2025-03-14T10:23:00+00:00
scopes: [tools, learning-style]
confidence: high
source: explicit-statement
---
When I ask for a "zero to hero" tutorial, I want a hands-on
walkthrough with code I can run, not a tour of the IDE
or interface chrome.
You can grep it. You can git log it. You can hand-edit it. To back up your memories, copy a directory. To migrate to something else, read the format. Your memory is yours, in a shape you can read.
This was a deliberate trade-off. A database would make memory_search faster at scale, but it would also make the data opaque and tie your memory to whatever runtime ships the database. Plain files keep the contract simple.
Tombstones, not deletes
When you remove a memory, the file moves to a .tombstones/ subdirectory and gets removed and removed_reason fields appended. The body stays. This costs almost nothing on disk and pays off in three ways.
First, memory_restore can undo a hasty removal. Second, memory_write runs a tombstone-aware dedup pass. If a paraphrase of a removed memory shows up six months later, the store flags it before re-introducing the wrong fact. Third, you keep an audit trail of what you decided to forget and why, which is occasionally useful when you find yourself half-remembering something the model used to know.
A surface for curation
Memory grows. Without a way of noticing what has decayed, every memory store eventually turns into a haunted closet of half-true notes.
memory_health is the curation surface. It reports dead weight (memories retrieved often but never marked applied), heavily-used memories, unresolved contradictions, scope distribution, scope typos (singletons within Levenshtein distance two of an existing scope), verification debt (counts of never-verified, stale, and fresh memories), and commit-drift debt (rows whose verification anchor sits behind HEAD in a tracked repo).
The same view is available as a CLI: bettermemory health. The system tells you what to prune, instead of you sifting through the corpus by hand.
This works because every retrieval is a tracked event. After a response uses a retrieved memory, the model logs memory_record_use(ids, outcome) with one of applied, ignored, contradicted, or corrected. Over time, that log feeds the health view. A memory that is retrieved every week but never marked applied is dead weight, and the system says so.
The user-inference confirmation tier
Most memories are project facts, infrastructure notes, or tooling references. Those commit immediately when the model calls memory_write. The user gets a one-line announcement, and the MCP permission gate is the primary veto point.
Inferences about the user are different. A claim like “the user prefers terse code review feedback” is a claim about you. If the model gets it wrong and saves it silently, the misattribution sticks. So memory_write(category="user-inference") always returns status: "pending" and waits for explicit confirmation. The model is expected to ask in plain language and only then call memory_write_confirm.
This is structurally enforced. Even if the global config says skip confirmation prompts, the user-inference tier still goes pending. The user always has the veto on claims about themselves.
Project scoping
Every memory written from inside a git checkout records the repo URL in its origin block. By default, memory_search filters hits to memories from the caller’s current repo, plus memories with no recorded origin (legacy entries or writes from outside any repo, which behave as global).
This is the right default for a working developer. When I am in the bettermemory repo, I see memories about bettermemory. When I switch to the portfolio, those bettermemory memories disappear. Cross-project queries are still possible, just explicit. Pass auto_scope=false to search across the entire store.
What it looks like in practice
Day one. You tell Claude something durable about how you like to work.
When I ask for a tutorial, I want runnable code, not screenshots of an IDE.
Claude calls memory_write with category="user-inference" and scopes=["learning-style"]. The write goes pending. Claude asks: “Want me to remember that you prefer hands-on tutorials with runnable code?” You confirm. The fact lands as a file you can read and edit.
Week two, in a fresh session, you ask:
Walk me through pandas from zero to hero.
The phrase “zero to hero tutorial” is the kind of ambiguity stored preferences could resolve. Claude calls memory_search, gets the stored preference back, and tells you up front: “Using your stored preference for code-driven tutorials…” before answering. Compare with auto-injection memory, which would have done the same thing silently, even on “what is the capital of France?”
Month three. You ask about an unrelated tool:
What is the difference between
findandfd?
Generic. Claude does not call memory_search. The reply is pristine generic-shell prose, untainted by months of accumulated context. That is the whole design.
Install
bettermemory ships as a Claude Code plugin. Two commands in any Claude Code session:
/plugin marketplace add 0Mattias/bettermemory
/plugin install bettermemory@bettermemory
For other clients (Claude Desktop, Cursor, Continue, Cline) or manual setup, install the package and run bettermemory init:
uv tool install bettermemory
bettermemory init --client claude-desktop
That idempotently merges the MCP server entry into the right config file. Restart the client and ask: “What memory tools do you have?” If the answer lists 17 tools starting with memory_, you are set.
Source on GitHub. MIT, Python 3.11 to 3.14.