Skip to main content

Memory

The memory domain (@thought-fabric/core/memory) gives agents structured recall across three tiers: working memory for the current conversation, episodic memory for significant experiences across sessions, and semantic memory for distilled, stable knowledge. Each tier has its own retention model. Together they form a pipeline where observations flow in, get classified, and settle into the right store based on how durable they are.

Quick Start

The fastest way to add the full memory system is memory.system(). It wires up all three tiers, gives you a capture pipeline, a cross-store recall function, and a context formatter for injecting memories into LLM prompts:

import { system as memorySystem } from '@thought-fabric/core/memory'
import { sequencer } from '@flow-state-dev/core'

const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: true,
semantic: true,
})

const pipeline = sequencer({ name: 'chat', inputSchema })
.then(chatGenerator)
.work(mem.captureFromItems)

mem.captureFromItems runs in the background via .work() after the generator. It reads the last user message and a truncated assistant response from session items, then runs the full capture pipeline: observe, classify, route to the right stores, advance decay, and (when enough evidence accumulates) consolidate into semantic facts. One line to add to a pipeline.

The capture block declares its own resources. The framework installs them automatically when the flow runs. No manual resource setup needed.

If you need to capture from explicit string input instead of session items, use mem.capture with a connector:

const pipeline = sequencer({ name: 'chat', inputSchema })
.work((input) => input.message, mem.capture)
.then(chatGenerator)

If you only need working memory, you can still use the standalone workingMemoryCapture block:

import { workingMemoryCapture } from '@thought-fabric/core/memory'

const memoryCapture = workingMemoryCapture({ model: 'gpt-5-mini' })

const pipeline = sequencer({ name: 'chat', inputSchema })
.work((input) => input.message, memoryCapture)
.then(chatGenerator)

How the Tiers Work

Each tier serves a different purpose:

TierScopeRetentionWhat it stores
WorkingSessionDecays over turnsActive context: what the agent is tracking right now
EpisodicUser or ProjectPersistentSignificant experiences: facts, events, preferences worth remembering across sessions
SemanticUser or ProjectStableDistilled knowledge: patterns, preferences, and facts extracted from repeated episodic evidence

Information flows upward. A user message enters as working memory. If the observer classifies it as persistent or permanent, it also goes to episodic memory. If it's a stable category (any semantic category — identity, profession, preference, belief, relationship, attribute, or pattern) with persistent/permanent durability, it goes directly to semantic memory too, tagged with a subject identifying who the fact is about. Over time, the consolidation pipeline reviews unconsolidated episodes and distills them into semantic facts via an LLM call.

Working memory is bounded and ephemeral. Episodic memory is an append-only log. Semantic memory is a curated knowledge base where facts get reinforced, updated, or invalidated as new evidence arrives.

The Unified System

memory.system() is the primary API. It returns an object with everything you need:

const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: { scope: 'user', significanceThreshold: 0.6 },
semantic: { consolidation: { episodicThreshold: 5 } },
})

What you get back:

PropertyTypePurpose
mem.captureSequencerFull pipeline: observe → reflect → tick (+ consolidation + prune)
mem.captureFromItemsBlockSelf-serving capture: reads from session items (no input needed)
mem.consolidateSequencerStandalone consolidation (when semantic configured)
mem.pruneSequencerStandalone prune (when semantic configured)
mem.recall(ctx, cue?)FunctionCross-store recall, ranked by relevance
mem.contextFormatterContext fnDrop into a generator's context array
mem.workingObjectResource + helpers for direct manipulation
mem.episodicObjectResource + helpers (if configured)
mem.semanticObjectResource + helpers (if configured)
mem.capabilityCapabilityComposed capability for uses: [mem.capability] (see below)
mem.workingMemoryCapabilityCapabilityWorking memory tier capability
mem.episodicMemoryCapabilityCapabilityEpisodic tier capability (if configured)
mem.semanticMemoryCapabilityCapabilitySemantic tier capability (if configured)

Pass true for any tier to use defaults. Pass an object to customize:

// Defaults for everything
const mem = memorySystem({ model: 'gpt-5-mini', working: true, episodic: true, semantic: true })

// Custom episodic, default semantic
const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 10 },
episodic: { scope: 'project', significanceThreshold: 0.5, maxEpisodes: 500 },
semantic: true,
})

Semantic requires episodic. You can't have semantic without episodic, because consolidation draws from the episodic store.

Capability Surface

Every memory.system() instance exposes a capability field that wraps the memory system's resources, context formatting, and helper functions into a single defineCapability() surface. Declare it in uses and the framework installs everything automatically.

const mem = memorySystem({
model: 'preset/fast',
working: { capacity: 7 },
episodic: true,
semantic: true,
})

// Generators: resources + context formatter auto-installed
const chat = generator({
name: 'chat',
model: 'preset/fast',
uses: [mem.capability],
user: (input) => input,
})

The composed capability includes a context preset (on by default) that injects unified cross-store recall into the generator's prompt. For non-generator blocks, disable the preset:

const myHandler = handler({
name: 'remember',
uses: [mem.capability.presets({ context: false })],
execute: async (input, ctx) => {
// Typed helpers via ctx.cap
await ctx.cap.workingMemory.add({ content: 'User likes pizza', importance: 0.8 })
const entries = ctx.cap.workingMemory.items()
const results = ctx.cap.memory.recall('pizza')
},
})

Individual tier capabilities

If you don't need the full system, individual tier capabilities are available as standalone exports:

import {
workingMemoryCapability,
episodicMemoryCapability,
semanticMemoryCapability,
} from '@thought-fabric/core/memory'

// Just working memory on a handler
const block = handler({
name: 'wm-only',
uses: [workingMemoryCapability],
execute: async (input, ctx) => {
await ctx.cap.workingMemory.add({ content: 'fact', importance: 0.7 })
},
})

Custom config via factory functions:

import { createWorkingMemoryCapability, createEpisodicMemoryCapability } from '@thought-fabric/core/memory'

const wmCap = createWorkingMemoryCapability({ capacity: 10, decay: { strategy: 'exponential', rate: 0.3 } })
const epCap = createEpisodicMemoryCapability({ scope: 'project', maxEpisodes: 500 })

The Capture Pipeline

mem.capture is a sequencer: observe → reflect → tick, with consolidation and pruning running as background work when semantic is configured.

Observe is a generator block. It sends recent conversation items to an LLM and gets back classified observations:

// Each observation has:
{
subject: string // Who this is about ('user', 'jennifer', etc.)
content: string // What to remember
importance: number // 0–1 score
durability: 'transient' | 'session' | 'persistent' | 'permanent'
category: 'identity' | 'event' | 'preference' | 'task' | 'relationship'
| 'profession' | 'belief' | 'attribute' | 'pattern'
replaces: string // ID of existing entry this supersedes, or ''
}

The observer checks existing working memory for contradictions. If a user says "I joined Stripe" and working memory has "works at Google," the observer marks the new entry with replaces pointing to the old one. Stale memories are worse than missing memories.

Reflect is a handler that routes observations to the right stores:

  • All items → working memory (with auto-eviction at capacity)
  • persistent/permanent items above the significance threshold → episodic memory
  • persistent/permanent items with stable categories (all semantic categories — everything except event and task) → semantic memory directly, scoped by subject

Tick advances the working memory decay clock and recomputes salience scores.

Consolidation (when semantic is configured) runs as .work() — background processing that doesn't block the pipeline. It checks whether enough episodic evidence has accumulated, and if so, calls an LLM to distill patterns into semantic facts.

Pruning also runs as .work() after consolidation. Once the semantic fact store grows past a threshold (default: 20 facts), an LLM evaluates the full fact set and removes redundant, noisy, or low-value facts — and merges facts that cover the same topic with complementary information.

Capturing Agent Responses

mem.capture takes a string input — typically the user's message. But the agent's response often contains valuable context too: corrections, inferred facts, commitments. mem.captureFromItems captures both sides of the conversation by reading directly from session items.

const pipeline = sequencer({ name: 'chat', inputSchema })
.then(analyzeInput)
.then(chatGenerator)
.work(mem.captureFromItems) // runs after the generator, sees both user + assistant
.then(postProcess)

captureFromItems is built using connectInput — it's the same capture pipeline, but with a connector that reads the last user message (in full) and the assistant's response (truncated to ~500 characters). The truncation keeps LLM cost low while still catching high-value content like corrections, clarifications, and inferred facts.

Position it after your generator block so it sees the full exchange. It runs as .work() (background), so it doesn't block the pipeline.

To customize the truncation limit:

const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: true,
semantic: true,
maxAssistantChars: 1000, // default: 500
})

When to use which:

  • mem.capture — when you have explicit string input (e.g., early in a pipeline before the generator)
  • mem.captureFromItems — after the generator, to capture both sides of the conversation

Injecting Memory into Prompts

Use mem.contextFormatter in a generator's context array:

import { generator } from '@flow-state-dev/core'

const chat = generator({
name: 'chat',
model: 'gpt-5',
inputSchema: z.string(),
context: [mem.contextFormatter],
user: (input) => input,
})

The formatter calls recall() internally and organizes memories into sections. Semantic facts are grouped by subject. When there's only one subject (user), it renders as a flat list:

Known facts:
- [profession] Works at Stripe
- [preference] Prefers TypeScript

Current focus:
- Working on a REST API migration

When multiple subjects exist, they're grouped:

About user:
- [identity] Name is Jake
- [profession] Works at Fixpoint Labs

About jennifer:
- [relationship] Spouse, goes by Moni

Current focus:
- Working on a REST API migration

Semantic facts appear first (highest authority), then working memory entries, then recent episodic memories. Duplicates across stores are filtered — if semantic memory has "Works at Stripe," the same entry won't appear again from working memory.

For direct access, use mem.recall(ctx, cue?):

const memories = mem.recall(ctx)
// Returns: RankedMemoryItem[] sorted by relevance

const focused = mem.recall(ctx, 'TypeScript preferences')
// Token overlap with cue boosts relevance

Consolidation

Consolidation is how episodic memories become semantic facts. It runs automatically as part of the capture pipeline when semantic memory is configured.

When it triggers: Consolidation runs when episodicWritesSinceLastConsolidation reaches the threshold (default: 5), or when a persistent/permanent entry is evicted from working memory. There's also a minimum turn interval to prevent rapid re-triggering (default: 4 turns).

What it does: The consolidation pipeline has three stages, gated so the LLM call is skipped entirely when conditions aren't met:

  1. Guard — Checks trigger conditions. If not met, returns early. If met, reads unconsolidated episodes and existing semantic facts.
  2. Generate — LLM call that synthesizes facts from episodes. Can create new facts, reinforce existing ones, update contradicted facts, or invalidate stale ones.
  3. Persist — Writes the results to the semantic store, marks episodes as consolidated, resets counters.

Contradiction handling is central. If episodic evidence contradicts an existing semantic fact, the LLM should update or invalidate it. The prompt emphasizes this: stale facts are worse than missing facts.

// Consolidation output per fact:
{
subject: string // Who this is about
content: string
confidence: number // 0–1, based on evidence strength
category: 'identity' | 'relationship' | 'preference' | 'belief'
| 'profession' | 'attribute' | 'pattern'
action: 'new' | 'reinforce' | 'update' | 'invalidate'
targetFactId: string // For reinforce/update/invalidate
sourceEpisodeIds: string[]
}

The consolidation LLM sees existing facts grouped by subject, making it easier to detect contradictions and reinforcements within an entity's knowledge.

Direct extraction vs consolidation: Not everything waits for consolidation. During the reflect step, items classified as persistent or permanent with stable categories (all semantic categories) go directly to semantic memory, tagged with the observer's subject field. This means a user saying "My name is Jake" gets stored as a semantic fact immediately, without waiting for the consolidation threshold. Dedup is subject-scoped: "born in May" about user only deduplicates against other user facts, not against facts about other entities. Consolidation is for finding patterns across multiple episodes — things no single observation makes obvious.

Pruning

As the semantic fact store grows, noise accumulates. Near-duplicates slip through dedup guards, session artifacts leak past classification, and related facts fragment across multiple entries. Pruning is an LLM-backed maintenance step that evaluates the full fact set and cleans it up.

When it triggers: Pruning runs when the semantic fact count reaches the threshold (default: 20). Like consolidation, it uses a guard → generate → persist pattern and runs as .work() in the capture pipeline.

What it does:

  1. Guard — Reads all semantic facts. If the count is below threshold, returns early.
  2. Generate — LLM call that reviews the full fact set and identifies:
    • Removals: Facts that are redundant, noisy (session artifacts), contradicted by newer facts, or too vague to be useful.
    • Merges: Groups of 2+ facts that cover the same topic with complementary information. For example, "User was born in Maryland" + "User was born in May" → "User was born in May in Maryland."
  3. Persist — Removes identified facts. For merges, updates the first source fact with the merged content and removes the rest, preserving provenance.

The LLM is instructed to be conservative. High-reinforcement facts (≥5) are protected unless clearly contradicted. High-confidence facts (≥0.8) require strong justification. When in doubt, facts are kept.

// Prune output:
{
removals: [{ factId: string, reason: string }]
merges: [{ sourceFactIds: string[], mergedContent: string, reason: string }]
}

Configuration:

const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: true,
semantic: { pruneThreshold: 30 }, // default: 20, set 0 to disable
})

You can also run pruning standalone via mem.prune if you want to trigger it outside the capture pipeline.

Working Memory

Working memory is a bounded, salience-scored store scoped to a session. Entries decay over time. When capacity is reached, the lowest-salience unpinned entry is evicted.

Model

  • Capacity: Default 7 entries (Miller's number). Configurable.
  • Pinned slots: Default 2. Pinned entries survive eviction; unpinned low-salience entries go first.
  • Decay: Salience = importance × decay(elapsed). Default strategy is power-law (ACT-R style): (1 + elapsed)^(-rate).
  • Eviction: When at capacity, the lowest-salience unpinned entry is removed before adding a new one.

Standalone Blocks

If you're not using the unified system, these blocks give you fine-grained control:

BlockKindPurpose
workingMemoryCaptureSequencerBundled: observe → remember → tick
workingMemoryObserveGeneratorLLM extraction of observations
workingMemoryRememberHandlerPersists observations to resource
workingMemoryTickHandlerAdvances decay clock
workingMemorySnapshotHandlerReturns current state sorted by salience
workingMemoryAddHandlerDirect entry addition (no LLM)
import {
workingMemoryObserve,
workingMemoryRemember,
workingMemoryTick,
} from '@thought-fabric/core/memory'

const pipeline = sequencer({ name: 'chat', inputSchema })
.work(
(input) => input.message,
sequencer({ name: 'memory', inputSchema: z.string() })
.then(workingMemoryObserve({ model: 'preset/fast' }))
.then(workingMemoryRemember())
.tap(workingMemoryTick())
)
.then(chatGenerator)

Helpers

For direct resource manipulation outside blocks:

HelperPurpose
addWorkingMemory(ref, entry, config?)Add entry with auto-eviction at capacity
evictWorkingMemory(ref, id)Remove by ID (overrides pin)
pinWorkingMemory(ref, id, config?)Pin to protect from eviction
unpinWorkingMemory(ref, id)Remove pin
refreshWorkingMemory(ref, id, config?)Reset access time (access boost)
advanceWorkingMemory(ref, config?)Advance turn, recompute salience
workingMemoryItems(ref)Entries sorted by salience
formatWorkingMemoryEntries(ref)Bullet list for LLM context

Decay Strategies

StrategyFormulaUse case
power-law (default)(1 + elapsed)^(-rate)ACT-R style; fast initial drop, long tail
exponentialexp(-rate × elapsed)Steeper, more aggressive decay
none1No decay; salience = importance forever. Good for testing.

Episodic Memory

Episodic memory records significant experiences across sessions. It's an append-only log of episodes scoped to either user or project. Episodes are written during the reflect step when items have persistent or permanent durability and meet the significance threshold.

Resource

Episodic memory uses a resource factory because the scope varies:

import { createEpisodicMemoryResource } from '@thought-fabric/core/memory'

const epResource = createEpisodicMemoryResource('user') // or 'project'

When using memory.system(), this is handled for you.

Helpers

HelperPurpose
encodeEpisode(ref, input, maxEpisodes)Write a new episode
recentEpisodes(ref, limit?)Get recent episodes (default: 10)
markEpisodesConsolidated(ref, ids)Mark episodes as processed by consolidation

Semantic Memory

Semantic memory is a curated knowledge base of stable facts. Unlike episodic memory (which records what happened), semantic memory records what's true — distilled from evidence over time. Facts have confidence scores that increase with reinforcement and can be updated or invalidated when new evidence contradicts them.

How facts arrive

Facts enter semantic memory through two paths:

  1. Direct extraction (during reflect): Items classified as persistent/permanent with a stable category (any semantic category — not event or task) go straight to semantic memory, tagged with a subject. Dedup is subject-scoped. No waiting for consolidation.
  2. Consolidation (background): After enough episodic evidence accumulates, an LLM reviews unconsolidated episodes and extracts patterns, reinforces existing facts, or corrects outdated ones.

Resource

Like episodic, semantic memory uses a resource factory:

import { createSemanticMemoryResource } from '@thought-fabric/core/memory'

const semResource = createSemanticMemoryResource('user') // or 'project'

Helpers

HelperPurpose
addSemanticFact(ref, input)Add a new fact
updateSemanticFact(ref, id, content, sourceIds?, confidence?)Update existing fact
reinforceSemanticFact(ref, id, sourceIds?)Increase confidence via reinforcement
removeSemanticFact(ref, id)Remove a fact (invalidation)
semanticFacts(ref)All facts
querySemanticFacts(ref, predicate)Filter facts by predicate

Fact Schema

{
id: string // Auto-generated
subject: string // Who this is about ('user', 'jennifer', etc.)
content: string // The fact itself
confidence: number // 0–1, increases with reinforcement
category: 'identity' | 'relationship' | 'preference' | 'belief'
| 'profession' | 'attribute' | 'pattern'
sourceEpisodeIds: string[]
extractedAt: string // ISO datetime
lastReinforced?: string // ISO datetime
reinforcementCount: number
}

Subject conventions:

  • 'user' — the primary user (default when omitted)
  • Lowercase first name for other people: 'jennifer', 'max'
  • Lowercase hyphenated name for organizations: 'fixpoint-labs'

Categories:

  • identity — who someone is: name, birthdate, location, background
  • profession — what someone does: job, company, role, skills
  • preference — likes, dislikes, style choices
  • belief — opinions, worldviews, values
  • relationship — connections to other named entities: spouse, pet, employer
  • attribute — properties/characteristics: possessions, abilities, circumstances
  • pattern — recurring behaviors

Configuration Defaults

All defaults are exported as constants for reference:

SettingDefaultConstant
Working memory capacity7DEFAULT_WORKING_MEMORY_CONFIG.capacity
Max pinned slots2DEFAULT_WORKING_MEMORY_CONFIG.maxPinnedSlots
Decay strategypower-lawDEFAULT_WORKING_MEMORY_CONFIG.decay.strategy
Decay rate0.5DEFAULT_WORKING_MEMORY_CONFIG.decay.rate
Episodic scopeuserDEFAULT_EPISODIC_CONFIG.scope
Significance threshold0.6DEFAULT_EPISODIC_CONFIG.significanceThreshold
Max episodes200DEFAULT_EPISODIC_CONFIG.maxEpisodes
Consolidation episodic threshold5DEFAULT_CONSOLIDATION_CONFIG.episodicThreshold
Consolidation on evictiontrueDEFAULT_CONSOLIDATION_CONFIG.onEviction
Consolidation min interval4 turnsDEFAULT_CONSOLIDATION_CONFIG.minInterval
Prune threshold20 factsDEFAULT_PRUNE_CONFIG.pruneThreshold

Naming Convention

The word order encodes the category:

  • workingMemory[Verb] — Block or item (e.g. workingMemoryCapture, workingMemoryObserve).
  • [verb]WorkingMemory — Helper (e.g. addWorkingMemory, evictWorkingMemory).
  • Same pattern for episodic (encodeEpisode, recentEpisodes) and semantic (addSemanticFact, querySemanticFacts).
  • memorySystem[Verb] — Unified system blocks (e.g. memorySystemObserve, memorySystemCapture).

Further Reading