Skip to main content
CryptoFlex// chris johnson
Shipping
§ 01 / The Blog · Claude Code Workflow

Persistent Memory for Claude Code: A Two-Tier Approach That Actually Works

How to give Claude Code real persistent memory using a global rule file and a vector database MCP server, so context survives across sessions without any manual effort.

Chris Johnson··14 min read

Every Claude Code session starts fresh. No memory of what you built last week, no recall of that workaround you discovered for the SQLite path issue, no context about the naming conventions specific to your project. You re-explain the same things every session. Eventually you start writing everything into CLAUDE.md by hand, but that becomes its own maintenance burden.

There's a better way. This post covers the four approaches I evaluated, the two-tier solution I settled on, and the specific gotchas that would have burned me if I hadn't thought through the design carefully first.

Want the Reference Version?

This post is the narrative walkthrough. If you want a condensed, printable reference covering the full architecture, configuration, and troubleshooting, download the Claude Code Persistent Memory PDF.

The Problem With "Just Use Hooks"#

It's worth understanding why the obvious answer does not work.

Hooks in Claude Code are shell scripts that run when events occur: PostToolUse, PreToolUse, SessionEnd, Stop. The natural instinct is to write a SessionEnd hook that dumps conversation context into some kind of memory store at the end of each session.

The problem: hooks only run if the session ends cleanly.

Hard Kills Lose Everything

If you close the terminal, kill the process, or lose connection mid-session, your SessionEnd hook never runs. Any context you intended to save from that session is gone. Designing your memory system around exit hooks creates a false sense of security.

This is why the architecture matters. The solution cannot be "save at the end." It has to be "save continuously throughout."

Infographic showing the two-tier persistent memory architecture for Claude Code: Tier 1 global memory rule driving automatic saves, Tier 2 MCP server with hybrid search, 30-day temporal decay, and local embedding

Evaluating Four Approaches#

I looked at four approaches before choosing:

ApproachPersistenceEffort to QueryReliabilityCost
Hooks-based (SessionEnd)Session boundaryHigh (raw files)Fragile (hard kills)Free
Knowledge Graph MCPPermanentMedium (entity graph)HighFree
Vector Database MCPPermanentLow (semantic search)HighFree (SQLite)
Enterprise Graph (Neo4j)PermanentLow (rich queries)HighPaid

The hooks approach fails on reliability. Enterprise graph is overkill for personal use. That left knowledge graph MCP and vector database MCP as the practical options.

Knowledge graph MCP (the memory MCP server) stores information as entities and relations. It is good at modeling explicit connections: "ServiceA depends on DatabaseB," "this bug was caused by that config." You interact with it deliberately by calling create_entities and create_relations.

Vector database MCP stores text chunks with embeddings and lets you search by semantic similarity. You give it a description, it finds related memories. The retrieval is more natural for development context: "what do I know about the SQLite migration issue?" returns relevant results even if you did not use those exact words when saving.

The key realization: vector search is what you actually want when you are trying to recall "what happened with X" at the start of a new session. Keyword matching on entity names requires you to remember exactly what you called something. Semantic search just works.

Both Tools Have a Role

I kept both MCP servers. The knowledge graph is good for explicit relationship modeling when you have a clear entity structure. Vector memory is better for general recall and context retrieval. The rule of thumb: use vector-memory for most things, use the knowledge graph only when you explicitly need to model connections.

The Two-Tier Solution#

The design I landed on has two layers that work together:

Tier 1: A global rule that tells Claude to save automatically.

Tier 2: An MCP server that provides the actual storage and retrieval.

This combination solves the discipline problem. If memory-saving requires a human to remember to do it, it will not happen consistently. The global rule makes it Claude's job. Claude writes the memories. You just benefit from them later.

Tier 1: The Memory Rule#

I created ~/.claude/rules/core/memory-management.md as a global instruction file. Because it lives in ~/.claude/rules/, Claude reads it at the start of every project, not just one.

The rule defines exactly when to save (the triggers) and what fields to include:

markdown
## When to Save (Triggers)

Save to vector-memory after ANY of these events:

1. Completing a significant task (feature, bug fix, refactor, config change)
2. Making an architectural decision (choosing a library, pattern, or approach)
3. Discovering a gotcha or workaround (something that took effort to figure out)
4. Resolving a bug (root cause, fix, and how it was found)
5. Learning a project convention (naming patterns, file structure, build commands)
6. Encountering an error and fixing it (error message, cause, solution)

## How to Save

Include these fields in every memory:

- What: concise description of what happened
- Why: the reasoning or root cause
- Tags: relevant keywords (project name, technology, pattern type)

And at the start of sessions:

markdown
## Session Start

At the beginning of each session, if the user describes a task related to previous work:
- Query vector-memory with relevant keywords to retrieve prior context
- Use retrieved memories to avoid re-learning or re-investigating

Session Start Retrieval Is as Important as Saving

Saving memories is only half the system. The retrieval half requires Claude to proactively query relevant memories when a session starts and the user describes a task. Without the session-start instruction, memories accumulate but never surface. Both halves need to be in the rule.

Tier 2: The MCP Server#

I installed mcp-memory-service, which provides a vector-based memory store backed by SQLite with the sqlite-vec extension. No external database server required. The database file lives at ~/Library/Application Support/mcp-memory/sqlite_vec.db on macOS, and the MCP server exposes tools like memory_store, memory_search, memory_list, and memory_delete.

Why SQLite? It runs locally with no cloud dependency, costs nothing, and is fast enough for personal use. The embedding model (all-MiniLM-L6-v2, 384 dimensions) runs in-process via PyTorch and sentence-transformers, so there are no external API calls for embedding or search.

The search is more capable than simple keyword matching. It supports three modes: semantic (cosine similarity on embeddings, finds "deployment failures" when you search "release problems"), exact (traditional BM25 keyword matching), and hybrid (70% semantic + 30% keyword, recommended).

Results are further refined by MMR deduplication, which suppresses near-duplicate results, and temporal decay with a 30-day half-life that ranks recent memories higher.

Installation

The service requires Python 3.11+. On macOS, install it via Homebrew:

bash
brew install python@3.11

Then install the package:

bash
pip3.11 install mcp-memory-service

Find where it installed:

bash
which memory-service
# or
python3.11 -m mcp_memory_service --help

Global MCP Configuration

The MCP server goes in ~/.claude.json so it is available in every project, not just one:

json
{
  "mcpServers": {
    "vector-memory": {
      "type": "stdio",
      "command": "python3.11",
      "args": ["-m", "mcp_memory_service.server"],
      "env": {
        "MCP_MEMORY_STORAGE_BACKEND": "sqlite_vec",
        "MCP_OAUTH_ENABLED": "false",
        "MCP_OAUTH_PRIVATE_KEY": "disabled",
        "MCP_OAUTH_PUBLIC_KEY": "disabled"
      }
    }
  }
}

The MCP_OAUTH_* env vars prevent the server from auto-generating RSA keys at import time, even when OAuth is disabled. Without them, RSA key material silently appears in config dumps.

Three things worth noting about this configuration:

First, the module is mcp_memory_service.server, not mcp_memory_service. The top-level package does not have a __main__ module. Running python -m mcp_memory_service produces a "cannot be directly executed" error. The .server submodule is the stdio transport that Claude Code expects. The package also ships an HTTP server (mcp-memory-server executable), but that is for web clients, not Claude Code.

Second, the python3.11 command must be the exact binary that has mcp-memory-service installed. If you installed it with pip3.11, use python3.11. If you used a virtual environment, point to that environment's Python binary. A mismatch here produces a cryptic "module not found" error. On Windows, use the full path to the Python executable to avoid ambiguity.

Third, the database is shared across all projects because the config is global. Project separation happens through tags. When saving a memory, include the project name as a tag: ["cryptoflexllc", "nextjs", "routing"]. When searching, you can filter by tag to get project-specific results.

Global Database, Tag-Based Separation

One shared database is simpler than per-project databases. You only configure the MCP server once, you do not have to think about which database to use in each repo, and you can search across projects when the context genuinely spans them. Tags handle the separation.

Auto-Approve Permissions

Because the memory tools will be called frequently and automatically (not just when you explicitly ask), add auto-approve permissions in ~/.claude/settings.json:

json
{
  "permissions": {
    "allow": [
      "mcp__vector-memory__memory_store",
      "mcp__vector-memory__memory_search",
      "mcp__vector-memory__memory_list",
      "mcp__vector-memory__memory_delete",
      "mcp__vector-memory__memory_update"
    ]
  }
}

Without these, Claude will ask for permission every time it tries to save a memory. That confirmation prompt is friction that defeats the purpose of automatic saving.

Windows Configuration

On Windows with Git Bash, the configuration is slightly different. Use the full path to Python and wrap the command for cmd.exe:

json
{
  "mcpServers": {
    "vector-memory": {
      "type": "stdio",
      "command": "C:\\Users\\yourname\\AppData\\Local\\Programs\\Python\\Python312\\python.exe",
      "args": ["-m", "mcp_memory_service.server"],
      "env": {
        "MCP_MEMORY_STORAGE_BACKEND": "sqlite_vec",
        "MCP_OAUTH_ENABLED": "false",
        "MCP_OAUTH_PRIVATE_KEY": "disabled",
        "MCP_OAUTH_PUBLIC_KEY": "disabled"
      }
    }
  }
}

Use the full Python path because python or python3 may resolve to a different version through the Windows App Installer shim. You can also optionally install onnxruntime (pip install onnxruntime) for better memory quality ranking, though it is not required.

Making It Fully Automatic With Hooks#

The global rule tells Claude when to save memories, but the rule works through Claude's instruction-following, not through a deterministic mechanism. Claude reads the rule and acts on it, which means it saves memories most of the time, but it is not guaranteed on every single event.

For most users, the rule alone is enough. Claude is reliable about following global rules, and the continuous-saving approach means a missed save here and there does not matter much. But if you want belt-and-suspenders automation, hooks add a complementary layer.

Why Hooks Cannot Replace the Rule#

Hooks are shell scripts. They run outside of Claude's context, which means they cannot call MCP tools. A PostToolUse hook can capture what Claude did (tool name, input, output) and write it to a log file, but it cannot call memory_store to save a structured memory to the vector database.

This is the key architectural distinction:

  • The global rule drives memory saving. It instructs Claude to call memory_store as it works, using Claude's own judgment about what is worth saving.
  • Hooks capture raw activity data. They log tool usage, archive transcripts, and provide the raw material that a background analysis agent can later process into insights.

Both are valuable, but they serve different roles. The rule is the primary memory system. Hooks are the safety net and the observation layer.

Complementary Hook Setup#

Here are the hooks that complement the memory system. Each one runs as a shell script (or PowerShell on Windows) triggered by Claude Code events.

File Guard (PreToolUse): Blocks edits to sensitive files like .env, .pem, and credentials.json. Not directly related to memory, but essential for any automated setup where Claude is acting with fewer permission prompts.

Activity Logger (PostToolUse): Logs every file edit, bash command, and write operation to activity_log.txt in the project root. This gives you a human-readable audit trail of what happened in each session.

Observation Capture (PostToolUse): Captures tool usage as JSONL to ~/.claude/homunculus/observations.jsonl. This feeds a separate learning system that can analyze patterns across sessions and extract reusable insights. The observations are raw data; the analysis happens later.

Session Archive (SessionEnd): Copies the conversation transcript into .claude/session_archive/ when a session ends cleanly. Yes, this depends on clean exit, which is why it is a supplement, not the primary system.

Notification Sound (Stop): Plays a system sound when Claude finishes its turn and needs your input. A small quality-of-life improvement that has nothing to do with memory but is worth including.

Registering Hooks in settings.json#

Having hook scripts in ~/.claude/hooks/ is not enough. They must be registered in ~/.claude/settings.json to actually fire. Without registration, the scripts sit idle and never execute.

On macOS/Linux, hook registration looks like this:

json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/file-guard.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write|Bash|Read|Grep|Glob",
        "hooks": [
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/observe-homunculus.sh",
            "timeout": 5000
          }
        ]
      }
    ],
    "SessionEnd": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/save-session.sh",
            "timeout": 10000
          }
        ]
      }
    ]
  }
}

On Windows, replace bash with powershell -Command ". 'full/path/to/script.ps1'" and use the full path to the hook script. Git Bash's MSYS2 layer can mangle paths, so explicit paths are safer than relying on ~ expansion inside PowerShell.

Hook Scripts Exist but Do Not Fire Without Registration

This is the most common setup mistake. You can have perfectly written hook scripts in ~/.claude/hooks/ and they will never run unless they are registered in the hooks section of ~/.claude/settings.json. If your hooks are not working, check registration first.

The matcher field controls which tools trigger the hook. "Edit|Write" fires only on file edits and writes. An empty string "" matches all events for that hook type. The timeout field prevents slow hooks from blocking Claude's workflow.

The Complete Automation Stack#

When everything is wired together, the automation has three layers working in parallel:

  1. Claude saves memories via the global rule (primary system). Structured memories with tags go into the vector database as Claude works. These survive hard kills because they are saved continuously.

  2. Hooks capture raw activity (observation layer). Every tool call is logged to JSONL for later analysis. Session transcripts are archived on clean exit. This data feeds pattern detection and learning systems.

  3. Auto-approve permissions remove friction (enablement layer). Without auto-approve, Claude would prompt for confirmation on every memory_store call. The permissions in settings.json let it save silently.

No single layer handles everything. The rule handles intent (what is worth remembering). Hooks handle completeness (capture everything, analyze later). Permissions handle ergonomics (no interruptions).

Vector Memory MCP overview: 5 layers, 12 tools, 30 instincts, hybrid search, behavioral learning, and privacy-first local processing

Why This Works When Other Approaches Do Not#

The design is shaped by two constraints that eliminated the obvious alternatives.

Constraint 1: Discipline does not scale.

If saving memories requires remembering to run a command, it will not happen reliably. The session gets busy, you close the terminal, you forget. The solution is to make it automatic by instructing Claude to do it. Claude is always running during the session; it does not forget.

Constraint 2: Hard kills are real.

Any approach that saves at session end is incomplete. The memory rule instructs Claude to save continuously: after completing a task, after making a decision, after discovering a workaround. If the session ends abruptly after step 7 of 10, steps 1 through 6 are already saved. You lose step 7, not everything.

Save During, Not After

Design your memory system around continuous saving, not exit-time saving. A SessionEnd hook is still useful for archiving the full transcript, but it should not be the primary mechanism for capturing development context.

What Gets Saved in Practice#

Here's an example of what the system saves after a typical bug fix:

What: Fixed SQLite UNIQUE constraint error on user table migration
Why: Migration ran twice because the CI pipeline did not check for existing schema
Tags: ["cryptoflexllc", "sqlite", "migration", "ci", "bug"]

Or after an architectural decision:

What: Chose React Query over SWR for data fetching in the dashboard
Why: React Query has better cache invalidation controls and devtools.
     SWR's mutation model does not fit the optimistic update patterns we need.
Tags: ["cryptoflexllc", "react-query", "architecture", "data-fetching"]

When the next session starts and the user mentions the dashboard, Claude queries vector-memory and surfaces the React Query decision. No re-explaining required.

Tags Are How You Find Things Later

The quality of retrieval depends on the quality of tags. Include the project name, the technology involved, and the type of memory (bug, decision, workaround, convention). Three to five tags per memory is enough. More than eight and you are over-engineering it.

The Knowledge Graph: Still Useful, Different Job#

The vector-memory MCP handles general recall well. But there is still a role for the knowledge graph MCP (the memory server).

The knowledge graph is the right tool when you need to model explicit relationships between named entities. For example, if you are building a multi-service system and want to track "AuthService depends on UserDatabase, which is also used by ProfileService," that relational structure is exactly what the knowledge graph is designed for.

The distinction in practice:

  • Vector memory: "What do I know about the SQLite issues I ran into?" (free-text recall)
  • Knowledge graph: "Show me all services that depend on UserDatabase" (explicit relationship query)

For most development work, vector memory handles everything. The knowledge graph supplements it when your domain has clear entity relationships you want to navigate structurally.

Verifying the Setup#

After configuration, verify the MCP server is running:

bash
claude mcp list

You should see vector-memory in the list. If it is missing, check that:

  1. The Python path in ~/.claude.json is the binary with mcp-memory-service installed
  2. The env paths exist (create the directories if needed)
  3. There are no JSON syntax errors in ~/.claude.json

Start a new Claude Code session and ask it to save a test memory:

Save a memory: this is a test memory with tags ["test", "setup"]

Then retrieve it:

Search my memories for "test memory"

If both work, the system is operational.

Restart Required After Config Changes

Changes to ~/.claude.json require restarting Claude Code to take effect. If the MCP server is not appearing, close and reopen the session completely.

Lessons Learned#

Several times during this build, the obvious approach turned out to be wrong.

The discipline trap is real. Every memory system I found online assumed you would manually call create_entities. Nobody does this consistently after the novelty wears off. The global rule that instructs Claude to save automatically is the only approach that survives contact with actual workflow patterns.

Hooks are not the answer for memory. Hooks are excellent for logging (PostToolUse for activity logs is great), but they are the wrong tool for memory persistence because hard kills skip them.

Global configuration is the right scope for memory. A per-project memory system means setting it up repeatedly and missing cross-project context. Global configuration with tag-based separation gives you both.

Python version precision matters. When you install a Python package and then reference it in an MCP config, the Python binary in the config must be the same one you used for installation. python3 might resolve to 3.9 on your system even if 3.11 is installed. Be explicit. On Windows, use the full path to the executable.

The module name has a gotcha. The MCP config must reference mcp_memory_service.server, not mcp_memory_service. The top-level package is not directly executable. Running the wrong module gives a confusing "cannot be directly executed" error that does not mention the correct submodule.

Hook scripts do not fire without registration. Having scripts in ~/.claude/hooks/ is not enough. They must be registered in the hooks section of ~/.claude/settings.json with the right event type and matcher pattern. I had five fully working scripts sitting dormant because I forgot this step.

Hooks and rules serve different roles. Hooks cannot call MCP tools because they are external shell scripts. The global rule drives memory saving through Claude's instruction-following. Hooks capture raw data for separate analysis. Trying to build memory saving entirely on hooks is a dead end.

OAuth key auto-generation is sneaky. The mcp-memory-service package auto-generates RSA keys at Python import time, even when OAuth is completely disabled. These keys silently appear in environment dumps and config exports. Setting MCP_OAUTH_PRIVATE_KEY=disabled and MCP_OAUTH_PUBLIC_KEY=disabled in the MCP server env prevents this.

Summary: What Was Built#

By the end of the session, the memory system consisted of five pieces:

  1. ~/.claude/rules/core/memory-management.md - global rule instructing Claude when and how to save memories, and to query them at session start
  2. mcp-memory-service installed via Python 3.11+, configured in ~/.claude.json as a global MCP server (using the mcp_memory_service.server module for stdio transport)
  3. Auto-approve permissions in ~/.claude/settings.json for all vector-memory tools
  4. Hook scripts in ~/.claude/hooks/ for activity logging, observation capture, file protection, session archiving, and notification
  5. Hook registration in ~/.claude/settings.json mapping each script to the right event type and tool matcher

The first three pieces are the core memory system. The last two are the automation and observation layer that makes it fully hands-off.

The result: development context now persists across sessions automatically, without any manual effort beyond doing the actual work. Claude saves what matters as it happens. Hooks capture raw activity for deeper analysis. Everything surfaces when it is relevant. No manual saves, no exit-time scripts to forget, no discipline required.

Resources#

If you want to see the exact configuration files, the full documentation, and the MCP server setup:

Related Posts

Claude Code's /compact command frees up context but destroys in-progress session state. Smart-compact is a custom skill that saves everything before you compact, so you can pick up exactly where you left off.

Chris Johnson··10 min read

How I replaced three separate Google Workspace MCP integrations with a single gws CLI skill, why CLI beats MCP for large API surfaces, and the four-tier safety system that keeps destructive operations from running without confirmation.

Chris Johnson··12 min read

A practical walkthrough of the 5 Node.js MCP servers I run with Claude Code: sequential-thinking, memory, context7, github, and project-tools. What they do, how to configure them on Windows, and what I learned testing each one.

Chris Johnson··12 min read

Comments

Subscribers only — enter your subscriber email to comment

Reaction:
Loading comments...

Navigation

Blog Posts

↑↓ navigate openesc close