Agent Memory is a data structure problem

February 2, 2026ยท7 min read

aiagentsarchitecture

I was on vacations for a week and away from code, when I came back I wanted to ask Cursor to continue working on a feature we'd been building together. It had no idea what I was talking about. The conversation history was gone, the context was gone, everything we'd discussed about the architecture and trade-offs had vanished. I had to explain the whole thing again from scratch.

This is the dirty secret of AI agents. They wake up with amnesia every single time. The model itself has no persistent memory. Everything it "knows" about you, your project, your preferences, exists only within the context window of the current session. Close the tab, and it's gone.

The obvious question is: can't we just save the conversation and load it back? Sure, but that's where it gets interesting. A day's coding session might generate hundreds of thousands of tokens. The context window, even on the largest models, fills up fast. You can't just dump everything back in and expect it to work.

This is when I realized that agent memory isn't really an AI problem. It's a data structure problem. And it's one we've actually solved before in other domains.

The two types of memory

Cognitive scientists talk about two kinds of memory in humans. Episodic memory is the stuff that happened to you: conversations, events, specific moments in time. Semantic memory is the stuff you know: facts, skills, general knowledge extracted from experience.

The same split makes sense for agents.

Episodic memory for an agent might be the raw conversation logs. Every message, every file it read, every command it ran. This is easy to store but expensive to retrieve. You can't load three weeks of conversation history into a context window.

Semantic memory is different. It's the distilled knowledge: "The user prefers TypeScript over JavaScript" or "The auth module uses JWTs with a 24 hour expiry" or "Last time we tried Redis for sessions it caused problems because of X." These are facts extracted from experience, compact enough to fit in context, useful enough to change behavior.

Building an agent with good memory means building both systems and knowing when to use which.

A Simple Architecture

Here's what I've been experimenting with. Nothing fancy, just the basics.

interface AgentMemory {
  episodic: EpisodicStore;
  semantic: SemanticStore;
}

interface EpisodicStore {
  append(entry: ConversationEntry): void;
  search(query: string, limit: number): ConversationEntry[];
  getRecent(hours: number): ConversationEntry[];
}

interface SemanticStore {
  remember(fact: Fact): void;
  forget(factId: string): void;
  recall(query: string): Fact[];
  getAllFacts(): Fact[];
}

The episodic store is append-only. Every interaction gets logged with a timestamp. When the agent needs to remember something specific ("what did we decide about the database schema last Tuesday?"), it searches the episodic store. Vector embeddings work well here. You embed the query, find the closest matches, and inject those specific entries into the context.

The semantic store is different. It's a curated set of facts that get loaded into every session. These are things the agent should always know. User preferences, project context, key decisions, lessons learned. The list should stay small, maybe a few hundred facts at most, enough to fit in the system prompt without eating too much of the context budget.

const semanticFacts = await memory.semantic.getAllFacts();
const recentEpisodic = await memory.episodic.getRecent(24);

const systemPrompt = `
You are an assistant helping with the project.

## What you know about this project:
${semanticFacts.map(f => `- ${f.content}`).join('\n')}

## Recent context:
${recentEpisodic.map(e => e.summary).join('\n')}
`;

The trick is keeping the semantic store updated. After each session, the agent (or a separate process) should review what happened and extract anything worth remembering long term. Did the user correct a misunderstanding? That's a fact. Did a particular approach fail? That's a fact. Did the user express a preference? Fact.

The compression problem

Even with this split, you run into the compression problem. A week of work generates a lot of episodic data. You can't search through all of it every time. And even semantic facts accumulate. After months of use, you might have thousands.

The solution is hierarchical summarization. Think of it like the way humans remember. You don't recall every word of a conversation from last year. You remember the gist, the key points, maybe one or two vivid details.

async function compressEpisodicMemory(entries: ConversationEntry[]): Promise<Summary> {
  const grouped = groupByDay(entries);
  
  const dailySummaries = await Promise.all(
    grouped.map(day => summarize(day))
  );
  
  if (dailySummaries.length > 7) {
    return summarize(dailySummaries); // weekly summary
  }
  
  return { summaries: dailySummaries };
}

Old conversations get summarized into daily digests. Old daily digests get summarized into weekly digests. The raw data can be archived or deleted. What remains is a compressed representation that captures the important stuff without the noise.

This is exactly how file systems have solved the same problem for decades. Hot data stays in fast storage. Cold data gets compressed and archived. Very old data gets summarized into metadata and eventually deleted. We're just applying the same pattern to conversation history.

When retrieval beats loading

One insight that took me a while to internalize: you almost never want to load memory. You want to retrieve it.

Loading means dumping a bunch of context into the prompt upfront, hoping the agent will need it. This wastes tokens and dilutes attention. The agent has to read through stuff that might not be relevant to the current task.

Retrieval means giving the agent a tool to fetch memories when it needs them. The agent decides when to look something up based on the task at hand.

const tools = [
  {
    name: "recall_memory",
    description: "Search your memory for relevant context. Use when you need to remember something about the project, the user, or past conversations.",
    parameters: {
      query: { type: "string", description: "What to search for" }
    },
    execute: async ({ query }) => {
      const episodic = await memory.episodic.search(query, 5);
      const semantic = await memory.semantic.recall(query);
      return { episodic, semantic };
    }
  }
];

This flips the mental model. Instead of trying to predict what the agent will need and preloading it, you trust the agent to ask for what it needs. It's the difference between packing everything in a suitcase versus knowing you can buy things when you arrive.

Of course, some context should always be loaded. The user's name, the project they're working on, critical preferences. But the long tail of memory should be on-demand.

Why this matters

I spent years building systems that manage state, caches, and data lifecycles. What strikes me now is how similar agent memory is to problems we've solved before. LRU caches, write-ahead logs, tiered storage, summarization and compaction. The primitives are the same.

The difference is that the "process" consuming this memory is a language model with fuzzy retrieval and limited attention. That changes the tradeoffs but not the fundamental patterns.

If you're building agents that need to remember things across sessions, don't think of it as an AI problem. Think of it as a storage and retrieval problem. What's the access pattern? What's the retention policy? How do you keep hot data hot and cold data compressed? How do you balance preloading versus on-demand retrieval?

The answers will look a lot like the answers we've given for databases and caches and file systems for the past fifty years. Just with embeddings instead of B-trees.