OpenClaw Deployment Part 3: How Agents Remember
If you read Part 1 and Part 2, you know that by the end of this deployment journey, I had seven specialized agents running, six Telegram bots responding, and a security configuration that actually works with the features it is protecting.
But there was a question I kept dancing around in Part 2: what happens when you restart the gateway? What does JClaw27 actually remember about yesterday? Does BobTheBuilder still know that you prefer specs before code? Does the security agent remember what it audited last week?
This is the post where I answer that. It is Part 3 of the OpenClaw Deployment series, and it covers the memory architecture: how agents store what they know, how they retrieve it, and why the system is designed the way it is.
The Problem: AI Agents Are Amnesic by Default#
Here is the uncomfortable truth about most AI agent setups: every conversation starts fresh. The model has no memory of prior sessions. Every time you restart, you are introducing yourself all over again.
For a single chatbot that answers questions, this is annoying but manageable. For a multi-agent system with seven specialists who are supposed to accumulate domain expertise over time, it is a fundamental limitation.
Why Context Windows Are Not Enough
You might think: why not just load all previous conversations into the context window? The context window on even the largest models has a finite size, usually 128k to 200k tokens. A single day of active agent usage can easily exceed that. And context window tokens cost money per request. You cannot practically stuff every prior session into every new session.
OpenClaw's answer to this is a three-tier memory system with searchable storage. Agents do not get all their history loaded by default. They get the right pieces, retrieved on demand, ranked by relevance and recency.
Let me walk through how it works.
Tier 1: Session Memory (Ephemeral)#
Session memory is the simplest tier. It is the conversation history for the current session: every message exchanged, every tool call made, every file read. It lives in memory only and disappears when the session ends or when you issue /new.
Session starts
-> messages accumulate in memory
-> tool calls and results accumulate
-> session ends or /new is issued
-> session-memory hook fires
-> key context extracted and written to daily log
-> ephemeral memory cleared
This tier does not need much explanation. It is what you see in the conversation window. The important thing is what happens at the boundary: the session-memory hook.
When you run /new or /reset, the hook fires before the context is cleared. It extracts the last ~15 messages, summarizes the session context, and writes it to a daily log file. That log becomes Tier 2.
Run /new Between Major Topics
The session-memory hook only fires when you explicitly end a session with /new or /reset. If you have a very long session that hits context limits, safeguard compaction will handle it, but you lose control over what gets captured. Running /new at natural breakpoints ensures important context is written to disk on your terms, not the compaction algorithm's terms.
Tier 2: Daily Logs (Short-Term)#
Daily logs are append-only Markdown files stored in each agent's workspace under a memory/ subdirectory. Each file is named by date:
~/.openclaw/workspace-jclaw/
memory/
2026-03-04.md
2026-03-05.md
2026-03-06.md
These files capture session context automatically. You do not write them manually. The session-memory hook creates them whenever a session ends.
Each daily log is per-agent and per-day. JClaw27's Tuesday notes are in workspace-jclaw/memory/2026-03-04.md. BobTheBuilder's Tuesday notes are in workspace-builder/memory/2026-03-04.md. The agents do not share daily logs, which is intentional: Security's audit notes should not pollute the Writer's creative memory.
Temporal Decay#
Daily logs do not get deleted, but they do fade. The hybrid search system applies a temporal decay weighting to all search results:
| Age | Score Multiplier |
|---|---|
| 0 days | 1.0 (full weight) |
| 30 days | 0.5 (half weight) |
| 60 days | 0.25 |
| 90 days | 0.125 |
| 180 days | ~0.015 |
The half-life is 30 days. A memory from last week scores ~80% of its original relevance. A memory from three months ago scores ~12.5%. Old logs do not disappear; they just become much less likely to surface in search results unless they are highly relevant to a specific query.
This is a nice design: you do not need a cleanup job to prune old memory. The decay handles it naturally.
Why 30 Days for the Half-Life?
30 days is a reasonable default for developer work. A decision made last week is still highly relevant. A decision made two months ago might have been superseded. A decision from six months ago is probably historical context, not active guidance. The decay curve reflects that intuition without requiring you to categorize every memory as "current" or "outdated."
Tier 3: Long-Term Memory (Curated)#
Each agent has a single MEMORY.md file in its workspace root. This is the agent's durable identity across sessions: the stable facts, decisions, preferences, and patterns that the agent should carry forever.
~/.openclaw/workspace-jclaw/MEMORY.md
~/.openclaw/workspace-builder/MEMORY.md
~/.openclaw/workspace-security/MEMORY.md
Unlike daily logs, which are written automatically by the hook, MEMORY.md is maintained by the agent itself. During a session, if the agent learns something it should remember permanently, it writes it to MEMORY.md. It is curated, not logged.
The distinction matters. Daily logs capture everything: every file read, every question asked, every tool invocation. MEMORY.md captures what should persist: "The user prefers specs before code begins." "The main repo lives at ~/GitProjects/cryptoflexllc." "Security review is required before any code is deployed."
Keep MEMORY.md Concise
The temptation with any persistent memory system is to write everything to it. Resist this. MEMORY.md is not a diary. It is a curated set of stable facts and preferences. If it grows too large, it stops being useful because the signal is buried in noise. Keep it to things that are genuinely stable and important. Let the daily logs handle the rest.
Here is an example of what BobTheBuilder's MEMORY.md might contain after a few weeks of active work:
# MEMORY.md - JClaw_BobTheBuilder
## User Preferences
- Specs before code: always ask for requirements before starting implementation
- TDD: write tests first, always
- Code review: submit to Security before deployment
## Project Context
- Primary repo: ~/GitProjects/cryptoflexllc
- Blog engine: Next.js 15 with MDX via next-mdx-remote
- Build command: npm run build
- Dev server: npm run dev
## Patterns Learned
- MDXRemote components are passed as props, not imported in .mdx files
- Callout components: Tip (green), Info (cyan), Warning (amber), Stop (red)
- Backlog posts go to src/content/backlog/
That is a few hundred tokens of high-signal context that the agent will carry into every session. No decay. No expiry. Just the facts that matter.
The Storage Layer: sqlite-vec and FTS5#
Daily logs and MEMORY.md are just Markdown files on disk. They are human-readable, git-friendly, and easy to edit. But searching across dozens of files with growing content needs an index. That index lives in a SQLite database using two extensions.
~/.openclaw/memory/
jclaw.sqlite
builder.sqlite
security.sqlite
writer.sqlite
researcher.sqlite
sysadmin.sqlite
secretary.sqlite
Each agent gets its own SQLite database. Inside each database, two systems work in parallel:
sqlite-vec is a SQLite extension that adds vector storage and similarity search. Every chunk of memory content (a paragraph or logical block from MEMORY.md or a daily log) is embedded using nomic-embed-text, producing a 768-dimensional floating-point vector. sqlite-vec stores these vectors alongside the source text and agent metadata. When you query, the query is also embedded, and sqlite-vec finds the closest vectors by cosine similarity.
SQLite FTS5 is SQLite's built-in full-text search extension. The same memory chunks are indexed for keyword search. BM25 ranking handles relevance scoring. This catches cases where semantic similarity might miss exact terminology: version numbers, function names, specific error messages.
Both indexes are built from the same source files, maintained by the openclaw memory index command, and queried together on every memory search.
The Embedding Model: nomic-embed-text#
The vector search system needs a model to convert text into vectors. OpenClaw uses nomic-embed-text, a model designed specifically for text embedding that runs locally via Ollama.
ollama pull nomic-embed-text
Some specs worth knowing:
| Attribute | Value |
|---|---|
| Model size | 274 MB |
| Vector dimensions | 768 |
| Query latency (warm) | ~78ms |
| Batch chunk latency | ~24ms per chunk |
| Cost | Free (fully local) |
| Architecture | Apple Silicon optimized |
274 MB is small. It lives in memory while Ollama is running and produces embeddings in under 100ms for a single query. For a batch index rebuild, you are looking at ~24ms per chunk, which means a workspace with 500 chunks (several months of active use) indexes in about 12 seconds.
Pull nomic-embed-text First
If you set up OpenClaw's memory system without pulling the embedding model, memory search will fail silently. The error message is something like "embedding provider unavailable." Run ollama pull nomic-embed-text before you configure memorySearch in the OpenClaw config. Check that it is present with ollama list.
Hybrid Search: Combining Vector and Keyword#
Here is where it gets interesting. Vector search alone is good at finding semantically similar content. Keyword search alone is good at finding exact matches. Neither is strictly better than the other, and they fail in complementary ways.
Vector search can miss exact terminology. If you ask "what was the sqlite-vec version we installed?", a semantic search might return chunks about database configuration generally, because the meaning is broadly similar. But the specific version number you need might only be in a chunk with the exact phrase "sqlite-vec".
Keyword search misses synonyms and paraphrase. If you wrote "we had trouble with the Docker socket permissions" in a daily log, searching for "container access denied" probably would not find it. A semantic search would, because the concepts are similar.
Hybrid search solves this by running both and merging the results.
The configuration that controls the merge:
{
"memorySearch": {
"provider": "ollama",
"model": "nomic-embed-text",
"query": {
"hybrid": {
"vectorWeight": 0.7,
"textWeight": 0.3,
"candidateMultiplier": 4,
"mmr": { "lambda": 0.7 },
"temporalDecay": { "halfLifeDays": 30 }
}
}
}
}
Let me walk through each parameter:
vectorWeight: 0.7 and textWeight: 0.3: Vector similarity gets 70% of the final score, keyword matching gets 30%. This weights semantic understanding higher, but keyword matching still has meaningful influence. You can tune these based on how your agents write their memory files.
candidateMultiplier: 4: Before final ranking, OpenClaw retrieves 4x more candidates than it will return. If you want the top 5 results, it retrieves 20 candidates (10 from vector, 10 from keyword), then ranks and filters down to 5. More candidates means better final quality, at the cost of a slightly slower query.
mmr.lambda: 0.7: This is Maximal Marginal Relevance, a technique for reducing redundant results. With MMR, each successive result is penalized for being too similar to already-selected results. The lambda value balances relevance (higher lambda) against diversity (lower lambda). At 0.7, it leans toward relevance while still filtering duplicates.
temporalDecay.halfLifeDays: 30: Already covered in the Tier 2 section. Applied during the final ranking pass, after vector and keyword scores are combined.
The full search pipeline looks like this:
Query: "How did we configure security?"
|
+---> nomic-embed-text (Ollama, local)
| |
| v
| [768-dim query vector]
| |
| v
| sqlite-vec: cosine similarity search
| |
| +---> Top 4N semantic candidates (weight: 0.7)
|
+---> SQLite FTS5: BM25 keyword search
|
+---> Top 4N keyword candidates (weight: 0.3)
|
v
[Reciprocal Rank Fusion]
(merge and normalize scores from both lists)
|
v
[MMR Diversity Filter]
(penalize redundant results)
|
v
[Temporal Decay Reranking]
(discount older memories)
|
v
Final ranked results (top N returned)
The result: an agent asking about security configuration will surface the most recent relevant notes first, filter out chunks that are all saying the same thing, and blend exact-match results (like specific command names) with semantic matches (like related discussions framed differently).
The SysAdmin Health Monitoring Connection#
Part 2 introduced JClaw_SysAdmin, the infrastructure agent responsible for server operations. One of SysAdmin's planned responsibilities is monitoring the health of the memory subsystem itself.
The memory system has a few things worth monitoring in production:
The SQLite index can drift out of sync if files are modified outside of OpenClaw (which happens if you manually edit a daily log or MEMORY.md in your editor). The openclaw memory index --force command rebuilds from scratch, but running it on a schedule catches drift before it causes stale search results.
Ollama's nomic-embed-text model needs to be running for search to work. If Ollama is not serving, embedding fails, and memory search degrades to keyword-only. SysAdmin can health-check the embedding endpoint at http://127.0.0.1:11434/api/embeddings and alert if it goes down.
The openclaw memory status --json command returns structured data about the state of each agent's index: total chunks, last indexed timestamp, vector index health. That output is machine-parseable and suitable for a health check script.
openclaw memory status --json
{
"agents": {
"main": {
"totalChunks": 342,
"lastIndexed": "2026-03-06T09:15:32Z",
"vectorIndexHealth": "ok",
"ftsIndexHealth": "ok"
},
"builder": {
"totalChunks": 187,
"lastIndexed": "2026-03-06T08:44:11Z",
"vectorIndexHealth": "ok",
"ftsIndexHealth": "ok"
}
}
}
I have not wired up the SysAdmin heartbeat tasks yet, but this is the shape of what I am building toward: a cron job that runs openclaw memory status --json, parses the output, and alerts via Telegram if any agent's index health is not "ok" or if lastIndexed is more than 24 hours old.
Memory Index Health as a Leading Indicator
A stale or degraded memory index does not cause immediate visible failures. Agents just start giving less contextually aware responses over time. By the time you notice, the index might have been degraded for days. Monitoring it proactively is the same reason you monitor disk usage before you hit 100%.
The Full Configuration#
Here is the complete memory-related configuration block that goes into openclaw.json. This is pulled directly from the deployment:
{
"agents": {
"defaults": {
"compaction": {
"mode": "safeguard"
},
"memorySearch": {
"provider": "ollama",
"model": "nomic-embed-text",
"query": {
"hybrid": {
"vectorWeight": 0.7,
"textWeight": 0.3,
"candidateMultiplier": 4,
"mmr": { "lambda": 0.7 },
"temporalDecay": { "halfLifeDays": 30 }
}
}
}
}
},
"hooks": {
"internal": {
"enabled": true,
"entries": {
"session-memory": { "enabled": true }
}
}
}
}
Four things in this config do most of the work:
compaction.mode: "safeguard" tells OpenClaw to summarize early messages when sessions get long, preserving tool results and file operations. This works alongside the session-memory hook: safeguard compaction handles mid-session context pressure, and the session-memory hook handles end-of-session persistence.
memorySearch.provider: "ollama" and memorySearch.model: "nomic-embed-text" route embeddings to the local Ollama instance. No external API. No per-query cost.
hooks.internal.session-memory.enabled: true activates the automatic daily log writing on /new and /reset.
The query.hybrid block is tunable. The defaults (0.7/0.3 split, 30-day half-life) work well for general developer workflows. If your agents write very precise technical notes with lots of exact terminology, you might shift the weights toward keyword (e.g., 0.5/0.5). If your notes are more conversational and varied in phrasing, lean further toward vector (e.g., 0.8/0.2).
Setting It Up from Scratch#
If you are building a fresh OpenClaw deployment and want to replicate this memory architecture, here is the setup sequence:
Step 1: Pull the embedding model.
ollama pull nomic-embed-text
Verify it is present:
ollama list
# NAME ID SIZE MODIFIED
# nomic-embed-text:... abc123 274 MB 2 hours ago
Step 2: Add the memory configuration to openclaw.json.
Add the memorySearch block under agents.defaults and enable the session-memory internal hook (shown in full above).
Step 3: Run the initial index.
After adding the config and restarting the gateway, run:
openclaw memory index --force
This does a full build of the SQLite indexes from whatever memory files exist in each workspace. For a fresh install with no history, this takes about one second. For a deployment with months of daily logs, it might take a few minutes.
Step 4: Verify.
openclaw memory status
You should see each agent listed with a chunk count, last indexed time, and health status. If you see "embedding provider unavailable" for any agent, verify that Ollama is running and that nomic-embed-text is loaded.
Step 5: Create workspace MEMORY.md stubs.
Each agent workspace should start with a MEMORY.md that contains at minimum the agent's name and role. The agent will fill it out over time, but starting with an empty file means the first search has nothing from that tier.
echo "# MEMORY.md - JClaw27\n\n## Identity\n- Orchestrator for the JClaw multi-agent system" \
> ~/.openclaw/workspace-jclaw/MEMORY.md
Why This Architecture Makes Sense#
I want to step back from the implementation details and explain why the three-tier design is the right approach for an agent system, rather than, say, just loading all prior conversations into context.
The cost problem. Every token in the context window costs money per request (for cloud models). Loading 30 days of conversation history into every session is not just impractical, it is expensive. Targeted retrieval of the relevant 2-5% of that history is both cheaper and more useful, because the noise is filtered out.
The quality problem. Humans do not remember everything equally. We remember the important parts, the patterns, the decisions. A raw conversation log has the signal buried in noise: filler words, failed attempts, questions that got answered and then superseded. The three-tier system reflects how memory actually works: recent context is readily available (session + daily logs), important stable facts are always accessible (MEMORY.md), and everything else fades with time (temporal decay).
The isolation problem. In a multi-agent system, you do not want agents sharing memory indiscriminately. Security's audit notes are not useful to Writer, and Writer's blog post drafts are not useful to SysAdmin. Per-agent isolated memory means each agent builds domain-relevant memory without cross-contamination.
The restart problem. Local AI systems restart. Servers get rebooted. Sessions time out. A system where all memory is in the context window loses everything on restart. A system where critical context is written to disk survives restarts as a first-class capability, not an afterthought.
Semantic Search Is Worth the Setup Cost
The nomic-embed-text setup (pull model, add config, run initial index) takes about 10 minutes. After that, you have a memory system where agents can find semantically relevant context even when the exact words do not match. That capability compounds over time: the more memory accumulates, the more useful the retrieval becomes. Front-load the setup cost, get compounding returns.
What I Have Not Done Yet#
Honest accounting, same as Parts 1 and 2.
Cross-agent memory search: Agents search only their own memory indexes. There is no way for JClaw27 to query BobTheBuilder's memory directly. This is by design (isolation), but it means the orchestrator cannot automatically surface relevant context from specialist agents without explicitly delegating a query. I have thought about a "shared memory" tier for things like project-level decisions that all agents should know, but I have not built it.
SysAdmin heartbeat tasks: The health monitoring scenario described earlier is on the roadmap but not yet wired up. SysAdmin needs a cron job definition and the health check script. I will cover that when I implement it.
Memory search in Telegram sessions: Memory search works in TUI sessions. I have not verified it is active for messages coming in through the Telegram channel. The architecture should handle it (the channel routes to the agent, the agent has memory search configured), but I have not explicitly tested it.
Test Memory Search in Your Actual Channels
If you are using OpenClaw with Telegram or other channel integrations, verify that memory search is working in those sessions specifically, not just in the TUI. The memory search configuration is gateway-level, so it should apply across all channels, but the interaction between the channel layer and the memory retrieval path is worth a dedicated test.
Lessons Learned#
nomic-embed-text Is the Right Model for This Job
At 274 MB and ~78ms per query, nomic-embed-text fits the operational profile of a local memory system perfectly. It is fast enough to not noticeably delay agent responses, small enough to not compete with the primary model for RAM, and produces 768-dimensional embeddings that capture semantic relationships well. Do not overthink the embedding model selection for this use case.
Temporal Decay Solves the Stale Memory Problem Elegantly
Most memory systems require explicit expiry dates or manual cleanup. The half-life decay approach requires neither. You write a note, it starts at full relevance, and naturally fades over 30-day intervals. Stale information does not disappear (it is still on disk and searchable if you need it), it just becomes less likely to surface unless it is highly relevant to a specific query. This is closer to how human memory works than an expiry date model.
Per-Agent Memory Isolation Is a Security Feature, Not Just Architecture
Each agent has its own isolated memory index. If an agent's session is somehow used in a prompt injection attack and writes malicious content to its memory files, the contamination is contained to that agent. It cannot propagate to other agents' memory indexes. Isolation is doing security work here, not just organizational work.
Safeguard Compaction Has a Limit at High Token Counts
The documentation notes that safeguard compaction can have issues around 180k tokens. If you have long-running sessions with heavy tool use, run /new proactively before you hit that limit. The session-memory hook will capture the context, and you start a fresh session with a lean context window and all the important content written to disk.
What Is Next#
The memory system is configured and running. Here is what is coming in the next phase of this deployment:
SysAdmin health monitoring: Wiring up the heartbeat cron job and Telegram alert for memory index health. This is the "it works in production because we know when it breaks" step.
Cross-agent memory sharing experiment: Testing a shared workspace approach where project-level decisions are written to a shared MEMORY.md that multiple agents can read. The goal is giving the orchestrator access to specialist context without breaking the isolation model.
qwen3:14b evaluation: The current local model for writer, researcher, and secretary is qwen2.5:14b. Qwen3 at 14B is a potential upgrade with better tool calling. I want to test whether memory retrieval quality improves with a better model driving the session.
Verification checklist update: Adding memory system checks to the post-config-change verification checklist from Part 2.
If there is a Part 4, it will probably cover one of these. The system keeps teaching me things. I will keep writing them down.
Written by Chris Johnson. The configuration shown is the actual memory architecture running on the JClaw multi-agent system on an M4 Mac Mini. Part 1 is here and Part 2 is here.
Weekly Digest
Get a weekly email with what I learned, summaries of new posts, and direct links. No spam, unsubscribe anytime.
Related Posts
After Part 1's fortress locked itself out, I rebuilt OpenClaw incrementally: one security control at a time, with 7 agents, 6 Telegram bots, and verification after every step.
I used a team of 5 AI security agents to build a hardened OpenClaw deployment on my M4 Mac Mini. After implementing every security control imaginable, nothing worked. Here is what happened, why I did not quit, and what I planned instead.
How to give Claude Code real persistent memory using a global rule file and a vector database MCP server, so context survives across sessions without any manual effort.
Comments
Subscribers only — enter your subscriber email to comment
