Memory
The memory domain (@thought-fabric/core/memory) gives agents structured recall across three tiers: working memory for the current conversation, episodic memory for significant experiences across sessions, and semantic memory for distilled, stable knowledge. Each tier has its own retention model. Together they form a pipeline where observations flow in, get classified, and settle into the right store based on how durable they are.
Quick Start
The fastest way to add the full memory system is memory.system(). It wires up all three tiers, gives you a capture pipeline, a cross-store recall function, and a context formatter for injecting memories into LLM prompts:
import { system as memorySystem } from '@thought-fabric/core/memory'
import { sequencer } from '@flow-state-dev/core'
const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: true,
semantic: true,
})
const pipeline = sequencer({ name: 'chat', inputSchema })
.then(chatGenerator)
.work(mem.captureFromItems)
mem.captureFromItems runs in the background via .work() after the generator. It reads the last user message and a truncated assistant response from session items, then runs the full capture pipeline: observe, classify, route to the right stores, advance decay, and (when enough evidence accumulates) consolidate into semantic facts. One line to add to a pipeline.
The capture block declares its own resources. The framework installs them automatically when the flow runs. No manual resource setup needed.
If you need to capture from explicit string input instead of session items, use mem.capture with a connector:
const pipeline = sequencer({ name: 'chat', inputSchema })
.work((input) => input.message, mem.capture)
.then(chatGenerator)
If you only need working memory, you can still use the standalone workingMemoryCapture block:
import { workingMemoryCapture } from '@thought-fabric/core/memory'
const memoryCapture = workingMemoryCapture({ model: 'gpt-5-mini' })
const pipeline = sequencer({ name: 'chat', inputSchema })
.work((input) => input.message, memoryCapture)
.then(chatGenerator)
How the Tiers Work
Each tier serves a different purpose:
| Tier | Scope | Retention | What it stores |
|---|---|---|---|
| Working | Session | Decays over turns | Active context: what the agent is tracking right now |
| Episodic | User or Project | Persistent | Significant experiences: facts, events, preferences worth remembering across sessions |
| Semantic | User or Project | Stable | Distilled knowledge: patterns, preferences, and facts extracted from repeated episodic evidence |
Information flows upward. A user message enters as working memory. If the observer classifies it as persistent or permanent, it also goes to episodic memory. If it's a stable category (any semantic category — identity, profession, preference, belief, relationship, attribute, or pattern) with persistent/permanent durability, it goes directly to semantic memory too, tagged with a subject identifying who the fact is about. Over time, the consolidation pipeline reviews unconsolidated episodes and distills them into semantic facts via an LLM call.
Working memory is bounded and ephemeral. Episodic memory is an append-only log. Semantic memory is a curated knowledge base where facts get reinforced, updated, or invalidated as new evidence arrives.
The Unified System
memory.system() is the primary API. It returns an object with everything you need:
const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: { scope: 'user', significanceThreshold: 0.6 },
semantic: { consolidation: { episodicThreshold: 5 } },
})
What you get back:
| Property | Type | Purpose |
|---|---|---|
mem.capture | Sequencer | Full pipeline: observe → reflect → tick (+ consolidation + prune) |
mem.captureFromItems | Block | Self-serving capture: reads from session items (no input needed) |
mem.consolidate | Sequencer | Standalone consolidation (when semantic configured) |
mem.prune | Sequencer | Standalone prune (when semantic configured) |
mem.recall(ctx, cue?) | Function | Cross-store recall, ranked by relevance |
mem.contextFormatter | Context fn | Drop into a generator's context array |
mem.working | Object | Resource + helpers for direct manipulation |
mem.episodic | Object | Resource + helpers (if configured) |
mem.semantic | Object | Resource + helpers (if configured) |
mem.capability | Capability | Composed capability for uses: [mem.capability] (see below) |
mem.workingMemoryCapability | Capability | Working memory tier capability |
mem.episodicMemoryCapability | Capability | Episodic tier capability (if configured) |
mem.semanticMemoryCapability | Capability | Semantic tier capability (if configured) |
Pass true for any tier to use defaults. Pass an object to customize:
// Defaults for everything
const mem = memorySystem({ model: 'gpt-5-mini', working: true, episodic: true, semantic: true })
// Custom episodic, default semantic
const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 10 },
episodic: { scope: 'project', significanceThreshold: 0.5, maxEpisodes: 500 },
semantic: true,
})
Semantic requires episodic. You can't have semantic without episodic, because consolidation draws from the episodic store.
Capability Surface
Every memory.system() instance exposes a capability field that wraps the memory system's resources, context formatting, and helper functions into a single defineCapability() surface. Declare it in uses and the framework installs everything automatically.
const mem = memorySystem({
model: 'preset/fast',
working: { capacity: 7 },
episodic: true,
semantic: true,
})
// Generators: resources + context formatter auto-installed
const chat = generator({
name: 'chat',
model: 'preset/fast',
uses: [mem.capability],
user: (input) => input,
})
The composed capability includes a context preset (on by default) that injects unified cross-store recall into the generator's prompt. For non-generator blocks, disable the preset:
const myHandler = handler({
name: 'remember',
uses: [mem.capability.presets({ context: false })],
execute: async (input, ctx) => {
// Typed helpers via ctx.cap
await ctx.cap.workingMemory.add({ content: 'User likes pizza', importance: 0.8 })
const entries = ctx.cap.workingMemory.items()
const results = ctx.cap.memory.recall('pizza')
},
})
Individual tier capabilities
If you don't need the full system, individual tier capabilities are available as standalone exports:
import {
workingMemoryCapability,
episodicMemoryCapability,
semanticMemoryCapability,
} from '@thought-fabric/core/memory'
// Just working memory on a handler
const block = handler({
name: 'wm-only',
uses: [workingMemoryCapability],
execute: async (input, ctx) => {
await ctx.cap.workingMemory.add({ content: 'fact', importance: 0.7 })
},
})
Custom config via factory functions:
import { createWorkingMemoryCapability, createEpisodicMemoryCapability } from '@thought-fabric/core/memory'
const wmCap = createWorkingMemoryCapability({ capacity: 10, decay: { strategy: 'exponential', rate: 0.3 } })
const epCap = createEpisodicMemoryCapability({ scope: 'project', maxEpisodes: 500 })
The Capture Pipeline
mem.capture is a sequencer: observe → reflect → tick, with consolidation and pruning running as background work when semantic is configured.
Observe is a generator block. It sends recent conversation items to an LLM and gets back classified observations:
// Each observation has:
{
subject: string // Who this is about ('user', 'jennifer', etc.)
content: string // What to remember
importance: number // 0–1 score
durability: 'transient' | 'session' | 'persistent' | 'permanent'
category: 'identity' | 'event' | 'preference' | 'task' | 'relationship'
| 'profession' | 'belief' | 'attribute' | 'pattern'
replaces: string // ID of existing entry this supersedes, or ''
}
The observer checks existing working memory for contradictions. If a user says "I joined Stripe" and working memory has "works at Google," the observer marks the new entry with replaces pointing to the old one. Stale memories are worse than missing memories.
Reflect is a handler that routes observations to the right stores:
- All items → working memory (with auto-eviction at capacity)
persistent/permanentitems above the significance threshold → episodic memorypersistent/permanentitems with stable categories (all semantic categories — everything excepteventandtask) → semantic memory directly, scoped by subject
Tick advances the working memory decay clock and recomputes salience scores.
Consolidation (when semantic is configured) runs as .work() — background processing that doesn't block the pipeline. It checks whether enough episodic evidence has accumulated, and if so, calls an LLM to distill patterns into semantic facts.
Pruning also runs as .work() after consolidation. Once the semantic fact store grows past a threshold (default: 20 facts), an LLM evaluates the full fact set and removes redundant, noisy, or low-value facts — and merges facts that cover the same topic with complementary information.
Capturing Agent Responses
mem.capture takes a string input — typically the user's message. But the agent's response often contains valuable context too: corrections, inferred facts, commitments. mem.captureFromItems captures both sides of the conversation by reading directly from session items.
const pipeline = sequencer({ name: 'chat', inputSchema })
.then(analyzeInput)
.then(chatGenerator)
.work(mem.captureFromItems) // runs after the generator, sees both user + assistant
.then(postProcess)
captureFromItems is built using connectInput — it's the same capture pipeline, but with a connector that reads the last user message (in full) and the assistant's response (truncated to ~500 characters). The truncation keeps LLM cost low while still catching high-value content like corrections, clarifications, and inferred facts.
Position it after your generator block so it sees the full exchange. It runs as .work() (background), so it doesn't block the pipeline.
To customize the truncation limit:
const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: true,
semantic: true,
maxAssistantChars: 1000, // default: 500
})
When to use which:
mem.capture— when you have explicit string input (e.g., early in a pipeline before the generator)mem.captureFromItems— after the generator, to capture both sides of the conversation
Injecting Memory into Prompts
Use mem.contextFormatter in a generator's context array:
import { generator } from '@flow-state-dev/core'
const chat = generator({
name: 'chat',
model: 'gpt-5',
inputSchema: z.string(),
context: [mem.contextFormatter],
user: (input) => input,
})
The formatter calls recall() internally and organizes memories into sections. Semantic facts are grouped by subject. When there's only one subject (user), it renders as a flat list:
Known facts:
- [profession] Works at Stripe
- [preference] Prefers TypeScript
Current focus:
- Working on a REST API migration
When multiple subjects exist, they're grouped:
About user:
- [identity] Name is Jake
- [profession] Works at Fixpoint Labs
About jennifer:
- [relationship] Spouse, goes by Moni
Current focus:
- Working on a REST API migration
Semantic facts appear first (highest authority), then working memory entries, then recent episodic memories. Duplicates across stores are filtered — if semantic memory has "Works at Stripe," the same entry won't appear again from working memory.
For direct access, use mem.recall(ctx, cue?):
const memories = mem.recall(ctx)
// Returns: RankedMemoryItem[] sorted by relevance
const focused = mem.recall(ctx, 'TypeScript preferences')
// Token overlap with cue boosts relevance
Consolidation
Consolidation is how episodic memories become semantic facts. It runs automatically as part of the capture pipeline when semantic memory is configured.
When it triggers: Consolidation runs when episodicWritesSinceLastConsolidation reaches the threshold (default: 5), or when a persistent/permanent entry is evicted from working memory. There's also a minimum turn interval to prevent rapid re-triggering (default: 4 turns).
What it does: The consolidation pipeline has three stages, gated so the LLM call is skipped entirely when conditions aren't met:
- Guard — Checks trigger conditions. If not met, returns early. If met, reads unconsolidated episodes and existing semantic facts.
- Generate — LLM call that synthesizes facts from episodes. Can create new facts, reinforce existing ones, update contradicted facts, or invalidate stale ones.
- Persist — Writes the results to the semantic store, marks episodes as consolidated, resets counters.
Contradiction handling is central. If episodic evidence contradicts an existing semantic fact, the LLM should update or invalidate it. The prompt emphasizes this: stale facts are worse than missing facts.
// Consolidation output per fact:
{
subject: string // Who this is about
content: string
confidence: number // 0–1, based on evidence strength
category: 'identity' | 'relationship' | 'preference' | 'belief'
| 'profession' | 'attribute' | 'pattern'
action: 'new' | 'reinforce' | 'update' | 'invalidate'
targetFactId: string // For reinforce/update/invalidate
sourceEpisodeIds: string[]
}
The consolidation LLM sees existing facts grouped by subject, making it easier to detect contradictions and reinforcements within an entity's knowledge.
Direct extraction vs consolidation: Not everything waits for consolidation. During the reflect step, items classified as persistent or permanent with stable categories (all semantic categories) go directly to semantic memory, tagged with the observer's subject field. This means a user saying "My name is Jake" gets stored as a semantic fact immediately, without waiting for the consolidation threshold. Dedup is subject-scoped: "born in May" about user only deduplicates against other user facts, not against facts about other entities. Consolidation is for finding patterns across multiple episodes — things no single observation makes obvious.
Pruning
As the semantic fact store grows, noise accumulates. Near-duplicates slip through dedup guards, session artifacts leak past classification, and related facts fragment across multiple entries. Pruning is an LLM-backed maintenance step that evaluates the full fact set and cleans it up.
When it triggers: Pruning runs when the semantic fact count reaches the threshold (default: 20). Like consolidation, it uses a guard → generate → persist pattern and runs as .work() in the capture pipeline.
What it does:
- Guard — Reads all semantic facts. If the count is below threshold, returns early.
- Generate — LLM call that reviews the full fact set and identifies:
- Removals: Facts that are redundant, noisy (session artifacts), contradicted by newer facts, or too vague to be useful.
- Merges: Groups of 2+ facts that cover the same topic with complementary information. For example, "User was born in Maryland" + "User was born in May" → "User was born in May in Maryland."
- Persist — Removes identified facts. For merges, updates the first source fact with the merged content and removes the rest, preserving provenance.
The LLM is instructed to be conservative. High-reinforcement facts (≥5) are protected unless clearly contradicted. High-confidence facts (≥0.8) require strong justification. When in doubt, facts are kept.
// Prune output:
{
removals: [{ factId: string, reason: string }]
merges: [{ sourceFactIds: string[], mergedContent: string, reason: string }]
}
Configuration:
const mem = memorySystem({
model: 'gpt-5-mini',
working: { capacity: 7 },
episodic: true,
semantic: { pruneThreshold: 30 }, // default: 20, set 0 to disable
})
You can also run pruning standalone via mem.prune if you want to trigger it outside the capture pipeline.
Working Memory
Working memory is a bounded, salience-scored store scoped to a session. Entries decay over time. When capacity is reached, the lowest-salience unpinned entry is evicted.
Model
- Capacity: Default 7 entries (Miller's number). Configurable.
- Pinned slots: Default 2. Pinned entries survive eviction; unpinned low-salience entries go first.
- Decay: Salience =
importance × decay(elapsed). Default strategy is power-law (ACT-R style):(1 + elapsed)^(-rate). - Eviction: When at capacity, the lowest-salience unpinned entry is removed before adding a new one.
Standalone Blocks
If you're not using the unified system, these blocks give you fine-grained control:
| Block | Kind | Purpose |
|---|---|---|
workingMemoryCapture | Sequencer | Bundled: observe → remember → tick |
workingMemoryObserve | Generator | LLM extraction of observations |
workingMemoryRemember | Handler | Persists observations to resource |
workingMemoryTick | Handler | Advances decay clock |
workingMemorySnapshot | Handler | Returns current state sorted by salience |
workingMemoryAdd | Handler | Direct entry addition (no LLM) |
import {
workingMemoryObserve,
workingMemoryRemember,
workingMemoryTick,
} from '@thought-fabric/core/memory'
const pipeline = sequencer({ name: 'chat', inputSchema })
.work(
(input) => input.message,
sequencer({ name: 'memory', inputSchema: z.string() })
.then(workingMemoryObserve({ model: 'preset/fast' }))
.then(workingMemoryRemember())
.tap(workingMemoryTick())
)
.then(chatGenerator)
Helpers
For direct resource manipulation outside blocks:
| Helper | Purpose |
|---|---|
addWorkingMemory(ref, entry, config?) | Add entry with auto-eviction at capacity |
evictWorkingMemory(ref, id) | Remove by ID (overrides pin) |
pinWorkingMemory(ref, id, config?) | Pin to protect from eviction |
unpinWorkingMemory(ref, id) | Remove pin |
refreshWorkingMemory(ref, id, config?) | Reset access time (access boost) |
advanceWorkingMemory(ref, config?) | Advance turn, recompute salience |
workingMemoryItems(ref) | Entries sorted by salience |
formatWorkingMemoryEntries(ref) | Bullet list for LLM context |
Decay Strategies
| Strategy | Formula | Use case |
|---|---|---|
power-law (default) | (1 + elapsed)^(-rate) | ACT-R style; fast initial drop, long tail |
exponential | exp(-rate × elapsed) | Steeper, more aggressive decay |
none | 1 | No decay; salience = importance forever. Good for testing. |
Episodic Memory
Episodic memory records significant experiences across sessions. It's an append-only log of episodes scoped to either user or project. Episodes are written during the reflect step when items have persistent or permanent durability and meet the significance threshold.
Resource
Episodic memory uses a resource factory because the scope varies:
import { createEpisodicMemoryResource } from '@thought-fabric/core/memory'
const epResource = createEpisodicMemoryResource('user') // or 'project'
When using memory.system(), this is handled for you.
Helpers
| Helper | Purpose |
|---|---|
encodeEpisode(ref, input, maxEpisodes) | Write a new episode |
recentEpisodes(ref, limit?) | Get recent episodes (default: 10) |
markEpisodesConsolidated(ref, ids) | Mark episodes as processed by consolidation |
Semantic Memory
Semantic memory is a curated knowledge base of stable facts. Unlike episodic memory (which records what happened), semantic memory records what's true — distilled from evidence over time. Facts have confidence scores that increase with reinforcement and can be updated or invalidated when new evidence contradicts them.
How facts arrive
Facts enter semantic memory through two paths:
- Direct extraction (during reflect): Items classified as
persistent/permanentwith a stable category (any semantic category — noteventortask) go straight to semantic memory, tagged with asubject. Dedup is subject-scoped. No waiting for consolidation. - Consolidation (background): After enough episodic evidence accumulates, an LLM reviews unconsolidated episodes and extracts patterns, reinforces existing facts, or corrects outdated ones.
Resource
Like episodic, semantic memory uses a resource factory:
import { createSemanticMemoryResource } from '@thought-fabric/core/memory'
const semResource = createSemanticMemoryResource('user') // or 'project'
Helpers
| Helper | Purpose |
|---|---|
addSemanticFact(ref, input) | Add a new fact |
updateSemanticFact(ref, id, content, sourceIds?, confidence?) | Update existing fact |
reinforceSemanticFact(ref, id, sourceIds?) | Increase confidence via reinforcement |
removeSemanticFact(ref, id) | Remove a fact (invalidation) |
semanticFacts(ref) | All facts |
querySemanticFacts(ref, predicate) | Filter facts by predicate |
Fact Schema
{
id: string // Auto-generated
subject: string // Who this is about ('user', 'jennifer', etc.)
content: string // The fact itself
confidence: number // 0–1, increases with reinforcement
category: 'identity' | 'relationship' | 'preference' | 'belief'
| 'profession' | 'attribute' | 'pattern'
sourceEpisodeIds: string[]
extractedAt: string // ISO datetime
lastReinforced?: string // ISO datetime
reinforcementCount: number
}
Subject conventions:
'user'— the primary user (default when omitted)- Lowercase first name for other people:
'jennifer','max' - Lowercase hyphenated name for organizations:
'fixpoint-labs'
Categories:
identity— who someone is: name, birthdate, location, backgroundprofession— what someone does: job, company, role, skillspreference— likes, dislikes, style choicesbelief— opinions, worldviews, valuesrelationship— connections to other named entities: spouse, pet, employerattribute— properties/characteristics: possessions, abilities, circumstancespattern— recurring behaviors
Configuration Defaults
All defaults are exported as constants for reference:
| Setting | Default | Constant |
|---|---|---|
| Working memory capacity | 7 | DEFAULT_WORKING_MEMORY_CONFIG.capacity |
| Max pinned slots | 2 | DEFAULT_WORKING_MEMORY_CONFIG.maxPinnedSlots |
| Decay strategy | power-law | DEFAULT_WORKING_MEMORY_CONFIG.decay.strategy |
| Decay rate | 0.5 | DEFAULT_WORKING_MEMORY_CONFIG.decay.rate |
| Episodic scope | user | DEFAULT_EPISODIC_CONFIG.scope |
| Significance threshold | 0.6 | DEFAULT_EPISODIC_CONFIG.significanceThreshold |
| Max episodes | 200 | DEFAULT_EPISODIC_CONFIG.maxEpisodes |
| Consolidation episodic threshold | 5 | DEFAULT_CONSOLIDATION_CONFIG.episodicThreshold |
| Consolidation on eviction | true | DEFAULT_CONSOLIDATION_CONFIG.onEviction |
| Consolidation min interval | 4 turns | DEFAULT_CONSOLIDATION_CONFIG.minInterval |
| Prune threshold | 20 facts | DEFAULT_PRUNE_CONFIG.pruneThreshold |
Naming Convention
The word order encodes the category:
workingMemory[Verb]— Block or item (e.g.workingMemoryCapture,workingMemoryObserve).[verb]WorkingMemory— Helper (e.g.addWorkingMemory,evictWorkingMemory).- Same pattern for episodic (
encodeEpisode,recentEpisodes) and semantic (addSemanticFact,querySemanticFacts). memorySystem[Verb]— Unified system blocks (e.g.memorySystemObserve,memorySystemCapture).
Further Reading
- API Reference — Full export list
- Attention — Salience scoring and relevance filtering
- Introduction — Thought Fabric overview and import paths