Memory

The memory domain (@thought-fabric/core/memory) gives agents structured recall across three tiers: working memory for the current conversation, episodic memory for significant experiences across sessions, and semantic memory for distilled, stable knowledge. Each tier has its own retention model. Together they form a pipeline where observations flow in, get classified, and settle into the right store based on how durable they are.

Quick Start

The fastest way to add the full memory system is memory.system(). It wires up all three tiers, gives you a capture pipeline, a cross-store recall function, and a context formatter for injecting memories into LLM prompts:

import { system as memorySystem } from '@thought-fabric/core/memory'
import { sequencer } from '@flow-state-dev/core'

const mem = memorySystem({
  model: 'gpt-5-mini',
  working: { capacity: 7 },
  episodic: true,
  semantic: true,
})

const pipeline = sequencer({ name: 'chat', inputSchema })
  .then(chatGenerator)
  .work(mem.captureFromItems)

mem.captureFromItems runs in the background via .work() after the generator. It reads the last user message and a truncated assistant response from session items, then runs the full capture pipeline: observe, classify, route to the right stores, advance decay, and (when enough evidence accumulates) consolidate into semantic facts. One line to add to a pipeline.

The capture block declares its own resources. The framework installs them automatically when the flow runs. No manual resource setup needed.

If you need to capture from explicit string input instead of session items, use mem.capture with a connector:

const pipeline = sequencer({ name: 'chat', inputSchema })
  .work((input) => input.message, mem.capture)
  .then(chatGenerator)

If you only need working memory, you can still use the standalone workingMemoryCapture block:

import { workingMemoryCapture } from '@thought-fabric/core/memory'

const memoryCapture = workingMemoryCapture({ model: 'gpt-5-mini' })

const pipeline = sequencer({ name: 'chat', inputSchema })
  .work((input) => input.message, memoryCapture)
  .then(chatGenerator)

How the Tiers Work

Each tier serves a different purpose:

Tier	Scope	Retention	What it stores
Working	Session	Decays over turns	Active context: what the agent is tracking right now
Episodic	User or Project	Persistent	Significant experiences: facts, events, preferences worth remembering across sessions
Semantic	User or Project	Stable	Distilled knowledge: patterns, preferences, and facts extracted from repeated episodic evidence

Information flows upward. A user message enters as working memory. If the observer classifies it as persistent or permanent, it also goes to episodic memory. If it's a stable category (any semantic category — identity, profession, preference, belief, relationship, attribute, or pattern) with persistent/permanent durability, it goes directly to semantic memory too, tagged with a subject identifying who the fact is about. Over time, the consolidation pipeline reviews unconsolidated episodes and distills them into semantic facts via an LLM call.

Working memory is bounded and ephemeral. Episodic memory is an append-only log. Semantic memory is a curated knowledge base where facts get reinforced, updated, or invalidated as new evidence arrives.

The Unified System

memory.system() is the primary API. It returns an object with everything you need:

const mem = memorySystem({
  model: 'gpt-5-mini',
  working: { capacity: 7 },
  episodic: { scope: 'user', significanceThreshold: 0.6 },
  semantic: { consolidation: { episodicThreshold: 5 } },
})

What you get back:

Property	Type	Purpose
`mem.capture`	Sequencer	Full pipeline: observe → reflect → tick (+ consolidation + prune)
`mem.captureFromItems`	Block	Self-serving capture: reads from session items (no input needed)
`mem.consolidate`	Sequencer	Standalone consolidation (when semantic configured)
`mem.prune`	Sequencer	Standalone prune (when semantic configured)
`mem.recall(ctx, cue?)`	Function	Cross-store recall, ranked by relevance
`mem.contextFormatter`	Context fn	Drop into a generator's `context` array
`mem.working`	Object	Resource + helpers for direct manipulation
`mem.episodic`	Object	Resource + helpers (if configured)
`mem.semantic`	Object	Resource + helpers (if configured)
`mem.capability`	Capability	Composed capability for `uses: [mem.capability]` (see below)
`mem.workingMemoryCapability`	Capability	Working memory tier capability
`mem.episodicMemoryCapability`	Capability	Episodic tier capability (if configured)
`mem.semanticMemoryCapability`	Capability	Semantic tier capability (if configured)

Pass true for any tier to use defaults. Pass an object to customize:

// Defaults for everything
const mem = memorySystem({ model: 'gpt-5-mini', working: true, episodic: true, semantic: true })

// Custom episodic, default semantic
const mem = memorySystem({
  model: 'gpt-5-mini',
  working: { capacity: 10 },
  episodic: { scope: 'project', significanceThreshold: 0.5, maxEpisodes: 500 },
  semantic: true,
})

Semantic requires episodic. You can't have semantic without episodic, because consolidation draws from the episodic store.

Capability Surface

Every memory.system() instance exposes a capability field that wraps the memory system's resources, context formatting, and helper functions into a single defineCapability() surface. Declare it in uses and the framework installs everything automatically.

const mem = memorySystem({
  model: 'preset/fast',
  working: { capacity: 7 },
  episodic: true,
  semantic: true,
})

// Generators: resources + context formatter auto-installed
const chat = generator({
  name: 'chat',
  model: 'preset/fast',
  uses: [mem.capability],
  user: (input) => input,
})

The composed capability includes a context preset (on by default) that injects unified cross-store recall into the generator's prompt. For non-generator blocks, disable the preset:

const myHandler = handler({
  name: 'remember',
  uses: [mem.capability.presets({ context: false })],
  execute: async (input, ctx) => {
    // Typed helpers via ctx.cap
    await ctx.cap.workingMemory.add({ content: 'User likes pizza', importance: 0.8 })
    const entries = ctx.cap.workingMemory.items()
    const results = ctx.cap.memory.recall('pizza')
  },
})

Individual tier capabilities

If you don't need the full system, individual tier capabilities are available as standalone exports:

import {
  workingMemoryCapability,
  episodicMemoryCapability,
  semanticMemoryCapability,
} from '@thought-fabric/core/memory'

// Just working memory on a handler
const block = handler({
  name: 'wm-only',
  uses: [workingMemoryCapability],
  execute: async (input, ctx) => {
    await ctx.cap.workingMemory.add({ content: 'fact', importance: 0.7 })
  },
})

Custom config via factory functions:

import { createWorkingMemoryCapability, createEpisodicMemoryCapability } from '@thought-fabric/core/memory'

const wmCap = createWorkingMemoryCapability({ capacity: 10, decay: { strategy: 'exponential', rate: 0.3 } })
const epCap = createEpisodicMemoryCapability({ scope: 'project', maxEpisodes: 500 })

The Capture Pipeline

mem.capture is a sequencer: observe → reflect → tick, with consolidation and pruning running as background work when semantic is configured.

Observe is a generator block. It sends recent conversation items to an LLM and gets back classified observations:

// Each observation has:
{
  subject: string        // Who this is about ('user', 'jennifer', etc.)
  content: string        // What to remember
  importance: number     // 0–1 score
  durability: 'transient' | 'session' | 'persistent' | 'permanent'
  category: 'identity' | 'event' | 'preference' | 'task' | 'relationship'
           | 'profession' | 'belief' | 'attribute' | 'pattern'
  replaces: string       // ID of existing entry this supersedes, or ''
}

The observer checks existing working memory for contradictions. If a user says "I joined Stripe" and working memory has "works at Google," the observer marks the new entry with replaces pointing to the old one. Stale memories are worse than missing memories.

Reflect is a handler that routes observations to the right stores:

All items → working memory (with auto-eviction at capacity)
persistent/permanent items above the significance threshold → episodic memory
persistent/permanent items with stable categories (all semantic categories — everything except event and task) → semantic memory directly, scoped by subject

Tick advances the working memory decay clock and recomputes salience scores.

Consolidation (when semantic is configured) runs as .work() — background processing that doesn't block the pipeline. It checks whether enough episodic evidence has accumulated, and if so, calls an LLM to distill patterns into semantic facts.

Pruning also runs as .work() after consolidation. Once the semantic fact store grows past a threshold (default: 20 facts), an LLM evaluates the full fact set and removes redundant, noisy, or low-value facts — and merges facts that cover the same topic with complementary information.

Capturing Agent Responses

mem.capture takes a string input — typically the user's message. But the agent's response often contains valuable context too: corrections, inferred facts, commitments. mem.captureFromItems captures both sides of the conversation by reading directly from session items.

const pipeline = sequencer({ name: 'chat', inputSchema })
  .then(analyzeInput)
  .then(chatGenerator)
  .work(mem.captureFromItems)  // runs after the generator, sees both user + assistant
  .then(postProcess)

captureFromItems is built using connectInput — it's the same capture pipeline, but with a connector that reads the last user message (in full) and the assistant's response (truncated to ~500 characters). The truncation keeps LLM cost low while still catching high-value content like corrections, clarifications, and inferred facts.

Position it after your generator block so it sees the full exchange. It runs as .work() (background), so it doesn't block the pipeline.

To customize the truncation limit:

const mem = memorySystem({
  model: 'gpt-5-mini',
  working: { capacity: 7 },
  episodic: true,
  semantic: true,
  maxAssistantChars: 1000,  // default: 500
})

When to use which:

mem.capture — when you have explicit string input (e.g., early in a pipeline before the generator)
mem.captureFromItems — after the generator, to capture both sides of the conversation

Injecting Memory into Prompts

Use mem.contextFormatter in a generator's context array:

import { generator } from '@flow-state-dev/core'

const chat = generator({
  name: 'chat',
  model: 'gpt-5',
  inputSchema: z.string(),
  context: [mem.contextFormatter],
  user: (input) => input,
})

The formatter calls recall() internally and organizes memories into sections. Semantic facts are grouped by subject. When there's only one subject (user), it renders as a flat list:

Known facts:
- [profession] Works at Stripe
- [preference] Prefers TypeScript

Current focus:
- Working on a REST API migration

When multiple subjects exist, they're grouped:

About user:
- [identity] Name is Jake
- [profession] Works at Fixpoint Labs

About jennifer:
- [relationship] Spouse, goes by Moni

Current focus:
- Working on a REST API migration

Semantic facts appear first (highest authority), then working memory entries, then recent episodic memories. Duplicates across stores are filtered — if semantic memory has "Works at Stripe," the same entry won't appear again from working memory.

For direct access, use mem.recall(ctx, cue?):

const memories = mem.recall(ctx)
// Returns: RankedMemoryItem[] sorted by relevance

const focused = mem.recall(ctx, 'TypeScript preferences')
// Token overlap with cue boosts relevance

Consolidation

Consolidation is how episodic memories become semantic facts. It runs automatically as part of the capture pipeline when semantic memory is configured.

When it triggers: Consolidation runs when episodicWritesSinceLastConsolidation reaches the threshold (default: 5), or when a persistent/permanent entry is evicted from working memory. There's also a minimum turn interval to prevent rapid re-triggering (default: 4 turns).

What it does: The consolidation pipeline has three stages, gated so the LLM call is skipped entirely when conditions aren't met:

Guard — Checks trigger conditions. If not met, returns early. If met, reads unconsolidated episodes and existing semantic facts.
Generate — LLM call that synthesizes facts from episodes. Can create new facts, reinforce existing ones, update contradicted facts, or invalidate stale ones.
Persist — Writes the results to the semantic store, marks episodes as consolidated, resets counters.

Contradiction handling is central. If episodic evidence contradicts an existing semantic fact, the LLM should update or invalidate it. The prompt emphasizes this: stale facts are worse than missing facts.

// Consolidation output per fact:
{
  subject: string        // Who this is about
  content: string
  confidence: number     // 0–1, based on evidence strength
  category: 'identity' | 'relationship' | 'preference' | 'belief'
           | 'profession' | 'attribute' | 'pattern'
  action: 'new' | 'reinforce' | 'update' | 'invalidate'
  targetFactId: string   // For reinforce/update/invalidate
  sourceEpisodeIds: string[]
}

The consolidation LLM sees existing facts grouped by subject, making it easier to detect contradictions and reinforcements within an entity's knowledge.

Direct extraction vs consolidation: Not everything waits for consolidation. During the reflect step, items classified as persistent or permanent with stable categories (all semantic categories) go directly to semantic memory, tagged with the observer's subject field. This means a user saying "My name is Jake" gets stored as a semantic fact immediately, without waiting for the consolidation threshold. Dedup is subject-scoped: "born in May" about user only deduplicates against other user facts, not against facts about other entities. Consolidation is for finding patterns across multiple episodes — things no single observation makes obvious.

Pruning

As the semantic fact store grows, noise accumulates. Near-duplicates slip through dedup guards, session artifacts leak past classification, and related facts fragment across multiple entries. Pruning is an LLM-backed maintenance step that evaluates the full fact set and cleans it up.

When it triggers: Pruning runs when the semantic fact count reaches the threshold (default: 20). Like consolidation, it uses a guard → generate → persist pattern and runs as .work() in the capture pipeline.

What it does:

Guard — Reads all semantic facts. If the count is below threshold, returns early.
Generate — LLM call that reviews the full fact set and identifies:
- Removals: Facts that are redundant, noisy (session artifacts), contradicted by newer facts, or too vague to be useful.
- Merges: Groups of 2+ facts that cover the same topic with complementary information. For example, "User was born in Maryland" + "User was born in May" → "User was born in May in Maryland."
Persist — Removes identified facts. For merges, updates the first source fact with the merged content and removes the rest, preserving provenance.

The LLM is instructed to be conservative. High-reinforcement facts (≥5) are protected unless clearly contradicted. High-confidence facts (≥0.8) require strong justification. When in doubt, facts are kept.

// Prune output:
{
  removals: [{ factId: string, reason: string }]
  merges: [{ sourceFactIds: string[], mergedContent: string, reason: string }]
}

Configuration:

const mem = memorySystem({
  model: 'gpt-5-mini',
  working: { capacity: 7 },
  episodic: true,
  semantic: { pruneThreshold: 30 },  // default: 20, set 0 to disable
})

You can also run pruning standalone via mem.prune if you want to trigger it outside the capture pipeline.

Working Memory

Working memory is a bounded, salience-scored store scoped to a session. Entries decay over time. When capacity is reached, the lowest-salience unpinned entry is evicted.

Model

Capacity: Default 7 entries (Miller's number). Configurable.
Pinned slots: Default 2. Pinned entries survive eviction; unpinned low-salience entries go first.
Decay: Salience = importance × decay(elapsed). Default strategy is power-law (ACT-R style): (1 + elapsed)^(-rate).
Eviction: When at capacity, the lowest-salience unpinned entry is removed before adding a new one.

Standalone Blocks

If you're not using the unified system, these blocks give you fine-grained control:

Block	Kind	Purpose
`workingMemoryCapture`	Sequencer	Bundled: observe → remember → tick
`workingMemoryObserve`	Generator	LLM extraction of observations
`workingMemoryRemember`	Handler	Persists observations to resource
`workingMemoryTick`	Handler	Advances decay clock
`workingMemorySnapshot`	Handler	Returns current state sorted by salience
`workingMemoryAdd`	Handler	Direct entry addition (no LLM)

import {
  workingMemoryObserve,
  workingMemoryRemember,
  workingMemoryTick,
} from '@thought-fabric/core/memory'

const pipeline = sequencer({ name: 'chat', inputSchema })
  .work(
    (input) => input.message,
    sequencer({ name: 'memory', inputSchema: z.string() })
      .then(workingMemoryObserve({ model: 'preset/fast' }))
      .then(workingMemoryRemember())
      .tap(workingMemoryTick())
  )
  .then(chatGenerator)

Helpers

For direct resource manipulation outside blocks:

Helper	Purpose
`addWorkingMemory(ref, entry, config?)`	Add entry with auto-eviction at capacity
`evictWorkingMemory(ref, id)`	Remove by ID (overrides pin)
`pinWorkingMemory(ref, id, config?)`	Pin to protect from eviction
`unpinWorkingMemory(ref, id)`	Remove pin
`refreshWorkingMemory(ref, id, config?)`	Reset access time (access boost)
`advanceWorkingMemory(ref, config?)`	Advance turn, recompute salience
`workingMemoryItems(ref)`	Entries sorted by salience
`formatWorkingMemoryEntries(ref)`	Bullet list for LLM context

Decay Strategies

Strategy	Formula	Use case
`power-law` (default)	`(1 + elapsed)^(-rate)`	ACT-R style; fast initial drop, long tail
`exponential`	`exp(-rate × elapsed)`	Steeper, more aggressive decay
`none`	1	No decay; salience = importance forever. Good for testing.

Episodic Memory

Episodic memory records significant experiences across sessions. It's an append-only log of episodes scoped to either user or project. Episodes are written during the reflect step when items have persistent or permanent durability and meet the significance threshold.

Resource

Episodic memory uses a resource factory because the scope varies:

import { createEpisodicMemoryResource } from '@thought-fabric/core/memory'

const epResource = createEpisodicMemoryResource('user')   // or 'project'

When using memory.system(), this is handled for you.

Helpers

Helper	Purpose
`encodeEpisode(ref, input, maxEpisodes)`	Write a new episode
`recentEpisodes(ref, limit?)`	Get recent episodes (default: 10)
`markEpisodesConsolidated(ref, ids)`	Mark episodes as processed by consolidation

Semantic Memory

Semantic memory is a curated knowledge base of stable facts. Unlike episodic memory (which records what happened), semantic memory records what's true — distilled from evidence over time. Facts have confidence scores that increase with reinforcement and can be updated or invalidated when new evidence contradicts them.

How facts arrive

Facts enter semantic memory through two paths:

Direct extraction (during reflect): Items classified as persistent/permanent with a stable category (any semantic category — not event or task) go straight to semantic memory, tagged with a subject. Dedup is subject-scoped. No waiting for consolidation.
Consolidation (background): After enough episodic evidence accumulates, an LLM reviews unconsolidated episodes and extracts patterns, reinforces existing facts, or corrects outdated ones.

Resource

Like episodic, semantic memory uses a resource factory:

import { createSemanticMemoryResource } from '@thought-fabric/core/memory'

const semResource = createSemanticMemoryResource('user')   // or 'project'

Helpers

Helper	Purpose
`addSemanticFact(ref, input)`	Add a new fact
`updateSemanticFact(ref, id, content, sourceIds?, confidence?)`	Update existing fact
`reinforceSemanticFact(ref, id, sourceIds?)`	Increase confidence via reinforcement
`removeSemanticFact(ref, id)`	Remove a fact (invalidation)
`semanticFacts(ref)`	All facts
`querySemanticFacts(ref, predicate)`	Filter facts by predicate

Fact Schema

{
  id: string              // Auto-generated
  subject: string         // Who this is about ('user', 'jennifer', etc.)
  content: string         // The fact itself
  confidence: number      // 0–1, increases with reinforcement
  category: 'identity' | 'relationship' | 'preference' | 'belief'
           | 'profession' | 'attribute' | 'pattern'
  sourceEpisodeIds: string[]
  extractedAt: string     // ISO datetime
  lastReinforced?: string // ISO datetime
  reinforcementCount: number
}

Subject conventions:

'user' — the primary user (default when omitted)
Lowercase first name for other people: 'jennifer', 'max'
Lowercase hyphenated name for organizations: 'fixpoint-labs'

Categories:

identity — who someone is: name, birthdate, location, background
profession — what someone does: job, company, role, skills
preference — likes, dislikes, style choices
belief — opinions, worldviews, values
relationship — connections to other named entities: spouse, pet, employer
attribute — properties/characteristics: possessions, abilities, circumstances
pattern — recurring behaviors

Configuration Defaults

All defaults are exported as constants for reference:

Setting	Default	Constant
Working memory capacity	7	`DEFAULT_WORKING_MEMORY_CONFIG.capacity`
Max pinned slots	2	`DEFAULT_WORKING_MEMORY_CONFIG.maxPinnedSlots`
Decay strategy	`power-law`	`DEFAULT_WORKING_MEMORY_CONFIG.decay.strategy`
Decay rate	0.5	`DEFAULT_WORKING_MEMORY_CONFIG.decay.rate`
Episodic scope	`user`	`DEFAULT_EPISODIC_CONFIG.scope`
Significance threshold	0.6	`DEFAULT_EPISODIC_CONFIG.significanceThreshold`
Max episodes	200	`DEFAULT_EPISODIC_CONFIG.maxEpisodes`
Consolidation episodic threshold	5	`DEFAULT_CONSOLIDATION_CONFIG.episodicThreshold`
Consolidation on eviction	`true`	`DEFAULT_CONSOLIDATION_CONFIG.onEviction`
Consolidation min interval	4 turns	`DEFAULT_CONSOLIDATION_CONFIG.minInterval`
Prune threshold	20 facts	`DEFAULT_PRUNE_CONFIG.pruneThreshold`

Naming Convention

The word order encodes the category:

workingMemory[Verb] — Block or item (e.g. workingMemoryCapture, workingMemoryObserve).
[verb]WorkingMemory — Helper (e.g. addWorkingMemory, evictWorkingMemory).
Same pattern for episodic (encodeEpisode, recentEpisodes) and semantic (addSemanticFact, querySemanticFacts).
memorySystem[Verb] — Unified system blocks (e.g. memorySystemObserve, memorySystemCapture).

Quick Start​

How the Tiers Work​

The Unified System​

Capability Surface​

Individual tier capabilities​

The Capture Pipeline​

Capturing Agent Responses​

Injecting Memory into Prompts​

Consolidation​

Pruning​

Working Memory​

Model​

Standalone Blocks​

Helpers​

Decay Strategies​

Episodic Memory​

Resource​

Helpers​

Semantic Memory​

How facts arrive​

Resource​

Helpers​

Fact Schema​

Configuration Defaults​

Naming Convention​

Further Reading​

Quick Start

How the Tiers Work

The Unified System

Capability Surface

Individual tier capabilities

The Capture Pipeline

Capturing Agent Responses

Injecting Memory into Prompts

Consolidation

Pruning

Working Memory

Model

Standalone Blocks

Helpers

Decay Strategies

Episodic Memory

Resource

Helpers

Semantic Memory

How facts arrive

Resource

Helpers

Fact Schema

Configuration Defaults

Naming Convention

Further Reading