Memory by Markdown: how MCP turns your notes into a brain for your AI

For most of 2025 every notes-app demo on Twitter ended the same way: someone in a hoodie types "remember this" into ChatGPT, the model nods, the demo ends. Two weeks later they are surprised that nothing was remembered. The model doesn't have memory; it has a context window, and a context window is just RAM with delusions of grandeur.

The interesting question — the one that actually changes what you ship — is not how do we give LLMs memory. It is: where do you put the bytes that an LLM should treat as memory. And the answer we kept coming back to, after building two and a half half-broken versions of it ourselves, is the most boring possible one. You put them in a notes app.

This post is about why that's the right call, why we built Vist with a built-in MCP server that mostly exposes plain CRUD verbs, and the three patterns we settled on once real users got their hands on it. It is also about the time we got the tool descriptions completely wrong and how the model itself, on different days, told us so.

What "memory" is shorthand for

"Agent memory" is one of those terms that means four different things to four different people in the same meeting. So before talking about how to build it, here's the taxonomy we now use internally — which I'll claim, by the end of the post, is the only one that survives contact with a real product.

Context — what the model can see in its current prompt window. Ephemeral by definition. Disappears the moment the tab closes.
Memory — what the model can recall, across sessions, given the right cue. Persistent. Often misnamed.
State — what's actually true right now, regardless of whether the model has been told. Your task list. Last Tuesday's decisions. The world.

Three things, three responsibilities. The pattern that almost every "memory for AI" startup ships conflates them, usually by putting everything into a vector database and hoping cosine similarity sorts it out. That's fine for one of the three (memory). It's wrong, often confidently wrong, for the other two.

This is the part where someone will email me about RAG. RAG is great. RAG is retrieval, which is one job of three. Treating retrieval as the whole story is how you end up with a chatbot that knows your CEO's birthday and not whether yesterday's standup happened.

Why a notes app is the right substrate

Once you separate the three concerns, a useful question falls out: what data store has the right shape for state and memory together? Not retrieval — retrieval is a feature you bolt on top. The substrate question is structural.

Our answer turned out to be: a notes app the user already opens every day. Two reasons, linked.

Addresses are the schema

A folder named forma means something to you, to me, and — given a half-decent prompt — to the model. A note titled 2026-05-12 · meeting with Ankush is self-describing in a way no documents table ever was. The address (folder + title) is the schema. The metadata you'd otherwise have to invent is already there, embedded in how the human organised their workspace.

Vist stores notes as Tiptap JSON in a PostgreSQL JSONB column — not as Markdown files on a filesystem you can grep. That's a deliberate trade-off: we keep the round-trippable Markdown export, but the canonical store is structured. Tasks are stored relationally, extracted from the note content; backlinks are resolved through a dedicated reference graph; full-text search runs through Postgres natively. The point isn't that the bytes are flat — it's that the addresses — work/decision_log, forma/meeting-ankush — are stable, human-meaningful, and the model can ask for them by name.

That's the part most "agent memory" products miss. They treat the substrate as an opaque vector store and then try to bolt the addressing back on with metadata fields. We did the opposite. The addressing came first. Embeddings (pgvector, generated via Mistral) sit on top, used when keyword search misses — not as the primary index.

MCP closes the loop

The piece that was missing in 2024 was a standard way for the model to write back. You could index a folder for retrieval; what you couldn't do, without one-off shims, was let Claude create a note, edit a task, or append to a decision log. MCP changed that. Specifically: MCP made it cheap enough to expose write capability that the question stopped being "should we let the model write?" and started being "what should we call the tools so it does the right thing?"

Vist ships with an MCP server built in. The available tools, in their shortest form:

# what Claude, Cursor, or any MCP client can call

# Notes & tasks (CRUD)
tool create_note(folder: string, title: string, content: md|json)
tool update_note(note_id: int, content?: md|json, title?: string)
tool list_notes(folder?: string, limit?: int)
tool get_note(note_id: int)
tool create_task(title: string, due?: date, priority?: string)
tool list_tasks(state: "today" | "upcoming" | "overdue" | "all")
tool complete_task(task_id: int)

# Search
tool search_knowledge_base(query: string, mode?: "fts" | "semantic")

# Agent memory layer
tool record_memory(memory_type: string, title: string, content: md, expires_at?: datetime)
tool query_memory(query: string, memory_types?: array)
tool load_context()
tool update_project_state(project_name: string, sections: hash)
tool sync_agent_memory(agent_id: string, since?: datetime)

# UI surfaces (interactive MCP Apps)
tool show_task_list(filter?: hash)

The shape is deliberately boring. We are not exposing a "memory" tool that takes an opaque blob. We are exposing the operations you'd run yourself if you were keeping notes by hand. The model treats them as such.

The shape of your MCP toolset is the shape of the model's mental model of your product. Make it boring on purpose.

The three patterns we settled on

Out of about a dozen patterns we tried in the first two months of having an MCP server live, three did the heavy lifting once real users got their hands on it. None of them were our idea originally. All three of them are what users reached for the moment they had the tools.

1 · The decision log. Stored via record_memory with memory_type: "decision_log". One per project, appended to by both the user and the model. Every entry is dated. Every entry says what was decided, who decided, and the alternatives considered. The model can call load_context at session start and stop asking you about choices you've already made. This is the pattern that turned "AI assistant" into "colleague who actually remembers last week."

2 · The agent README. Stored as a regular note in a _agent_memory folder (Vist auto-creates one per account), plus a system prompt the user pastes into their MCP client of choice from Settings → MCP. The note contains the prose the model would otherwise be guessing at: your voice, your stylistic preferences, the fact that you hate bullet-pointed summaries unless asked. The system prompt tells the model to call load_context() before doing anything else. Together they collapse the "give me your context" preamble that everyone was typing every morning.

3 · Project state. Stored via update_project_state(project_name, sections) — typed sections like current_task, recent_changes, next_steps, blockers. The shape is enforced by the tool, not the model, so it doesn't drift across sessions or agents. You can read it like a note (it is a note); you can edit it like a note. The model treats it as the single source of truth for "what are we doing on Forma this week."

Each of these is a notes-app native object. None of them requires a vector database to exist — the embeddings just help the search step. All three of them, in aggregate, do more for "memory" than every embedding pipeline we tried before.

What we got wrong

For most of February the MCP server shipped with verbose, human-readable tool descriptions. We'd written them like API documentation — "This tool creates a new note in the user's workspace. It accepts a folder parameter (string) indicating the destination folder, and a title parameter (string) which becomes the note's title. The body can optionally be provided as Markdown or as a structured Tiptap document. The new note is automatically tagged with the current timestamp and …"

You can probably guess what happened. The model read those descriptions the way it reads any other prose — as scene-setting, not as a contract — and when two tools had overlapping language ("create" appeared in create_note, create_task, and create_folder), it tied on prose length and picked the wrong one. We watched it create a task when the user clearly asked for a note. We watched it create a folder when the user clearly asked for a task. The verbs were too generic; the descriptions buried the differences.

The fix went through a few steps, and the order matters.

First, we prefixed every tool with vist_ — the intuition being that giving the model a clear namespace (vist_* is the notes tool, everything else is the model's) would resolve the ambiguity. That helped, briefly. It papered over the description problem rather than fixing it.

The actual fix was rewriting every description to lead with the action and the side effect, in that order. create_note's description became "Create a new note in the specified folder. Returns the note's ID and URL. The note appears immediately in the user's sidebar and is searchable." No more API prose. Just: what happens, what comes back, where the user sees it.

Once the descriptions were sharp, we took the vist_ prefix back off. It had become redundant — and worse, it was creating false pattern-matches where the model would call a vist_list_* tool for things that weren't list operations, because the prefix looked authoritative. The descriptions did the work on their own.

The last step was applying the same treatment to every parameter: what the value means, what happens if you omit it, what the default is. Tool calls stopped silently failing on parameter shapes the model had inferred from training data rather than from our spec.

The lesson, distilled: descriptions are a contract, not documentation. The model reads the whole tool-list as a flat decision space, weighing each entry against every other entry. The wordy version of your tool wins lottery ties against the precise version of someone else's tool. Lead with the action. Name the side effects. Cut everything else.

The shorter version

If you want to build agent memory, build a notes app. If you have a notes app, give it an MCP server. Keep the tools boring and the verbs concrete. Trust that the model, given a workspace whose addresses mean something, will figure out the rest of the schema for itself.

The bytes have to live somewhere. The somewhere should be a place you can also open in your editor, search in the UI, back up to a .zip of plain Markdown, and read on a plane. Anything else, in 2026, is a regression dressed up as innovation.

Memory & MCP

← Back to all posts

What "memory" is shorthand for

Why a notes app is the right substrate

Addresses are the schema

MCP closes the loop

The three patterns we settled on

What we got wrong

The shorter version

If you liked this, you'll like these.

Four ways to read the same note

Tasks, on time. Politely.

Read-only is a feature, not a limitation