Models

Every generator block needs a model. The framework gives you multiple ways to specify one, from a single model string to a named intent that handles fallback across providers automatically.

Model Strings

The simplest form. A slash-separated provider and model ID:

const chat = generator({
  name: "chat",
  model: "openai/gpt-5.5",
  prompt: "You are a helpful assistant.",
});

Supported formats:

Format	Example	What it does
`provider/model`	`"anthropic/claude-sonnet-4.6"`	Direct provider call
`intent/<name>`	`"intent/chat"`	Resolves to the first available model in the named intent
`gateway/provider/model`	`"vercel/openai/gpt-5.5"`	Routes through a gateway

Intents

An intent is a named routing group. When you write model: "intent/utility", the framework picks the first candidate in that intent's list that has a working API key configured. If that model fails at runtime, it retries then falls back to the next one. If the whole list is unreachable, it falls back to the resolver's defaultModel.

One line of config gives you multi-provider redundancy:

const chat = generator({
  name: "chat",
  model: "intent/chat",
  prompt: "You are a helpful assistant.",
});

Canonical intent names

The framework documents six intent names. Apps configure their own model lists per intent — there are no built-in defaults. Pick names from this list when you can; add your own when you need to.

Intent	Cognitive shape
`utility`	Bounded utility tasks: classification, routing, extraction, summarization, titles. Smallest reliable model.
`chat`	User-facing assistant turns. App-tunable.
`plan`	Goal decomposition, task graphs, supervisor planning. Errors compound — point this at your strongest model.
`synthesize`	Combining intermediate artifacts; structured-JSON heavy work. Doubles as the structured-JSON intent — apps should point it at JSON-reliable models (Sonnet/GPT-class), not the cheapest tier.
`code`	Code generation, review, debugging.
`reason`	Open-ended deliberation that doesn't fit the other names.

Configuring intents

Intents are configured on the model resolver:

import { createModelResolver } from "@flow-state-dev/core/models";

const resolver = createModelResolver({
  defaultModel: "anthropic/claude-sonnet-4.6",
  intents: {
    utility: ["anthropic/claude-haiku-4.5", "openai/gpt-5.4-nano"],
    chat: ["anthropic/claude-sonnet-4.6", "openai/gpt-5.5"],
    plan: ["anthropic/claude-opus-4.7"],
    synthesize: ["anthropic/claude-sonnet-4.6", "openai/gpt-5.5"],
  },
});

defaultModel is required when intents is non-empty. It must be a provider/model or gateway/provider/model string, never another intent/*. Both rules are enforced at construction: missing defaultModel throws createModelResolver: defaultModel is required when intents are configured, and an intent/* default throws createModelResolver: defaultModel must not be an intent/* string. If a generator references an intent name that isn't configured, the resolver logs a dev warning ([flow-state-dev] Unknown or empty intent "<name>"; falling back to defaultModel.) and uses defaultModel.

See Custom Model Resolver for the full options reference.

Env-var overrides

You can replace which model a declared intent (or defaultModel) resolves to per environment, without touching code. The motivating case is debugging AI flows against real LLMs cheaply in dev or CI. Production wiring stays the source of truth.

Variable naming. FSDEV_INTENT_<NAME> overrides intent <name>'s candidate list. <NAME> is the intent name uppercased with hyphens replaced by underscores, so chat → FSDEV_INTENT_CHAT and my-custom → FSDEV_INTENT_MY_CUSTOM. If two declared intents normalize to the same env-var name (my-custom and my_custom both → FSDEV_INTENT_MY_CUSTOM), construction throws.

Value shape. A single provider/model or gateway/provider/model string. intent/*, preset/*, empty values, and malformed strings all throw at construction. Comma-separated multi-value lists are deliberately not supported in this phase.

defaultModel override. FSDEV_DEFAULT_MODEL replaces defaultModel. This covers the fallback path for empty, unknown, or fully-unavailable intents.

When it's read. Construction time only, once. Set the env var before createModelResolver is called; changing it later in a running process has no effect.

What happens if it's wrong. Most failure modes are construction-time errors, not silent fallbacks:

malformed value for a declared intent (intent/foo, preset/fast, empty, garbage)
FSDEV_DEFAULT_MODEL set when no intents are declared (the override would have no effect)

One case is deliberately not fatal: an FSDEV_INTENT_<NAME> that names an intent the resolver doesn't declare is warned and ignored, not thrown. Env vars are ambient — a shared or CI environment may pin an intent override for some other app, and your app must not crash because of it. A typo in an intent you do declare still surfaces as a warning and falls back to defaultModel.

Confirming it took effect. Each applied override emits one dev-only log at construction (suppressed by NODE_ENV=production and FSD_QUIET_WARNINGS=1). Example: [flow-state-dev] Intent "chat" overridden by FSDEV_INTENT_CHAT; resolves to "openai/gpt-5.4-mini".

Worked example. Same resolver, two environments:

// server.ts — unchanged across environments
const resolver = createModelResolver({
  defaultModel: "anthropic/claude-sonnet-4.6",
  intents: {
    chat: ["anthropic/claude-sonnet-4.6", "openai/gpt-5.5"],
    utility: ["anthropic/claude-haiku-4.5"],
  },
});

# .env.test — point dev/CI at the cheap tier
FSDEV_INTENT_CHAT=openai/gpt-5.4-mini
FSDEV_INTENT_UTILITY=openai/gpt-5.4-mini
FSDEV_DEFAULT_MODEL=openai/gpt-5.4-mini

In CI the resolver runs against the cheap models; in production the env vars are unset and the declared candidates win.

Construction-time error example.

FSDEV_INTENT_CHAT=garbage pnpm dev
# Error: createModelResolver: FSDEV_INTENT_CHAT: Invalid model format: "garbage". ...

For the resolver-author view (the env injection seam used by tests, and the precedence table), see Env-var overrides in Custom Model Resolver.

Worked examples

One generator per intent, wired to a realistic block shape. The point is to show what each intent looks like in practice; production blocks would add the usual prompt and schema details.

utility — a small classifier with a structured output:

const classifyIntent = generator({
  name: "classify-intent",
  model: "intent/utility",
  outputSchema: z.object({
    intent: z.enum(["greeting", "question", "complaint", "other"]),
    confidence: z.number(),
  }),
  prompt: "Classify the user's message. Be conservative on confidence.",
});

chat — a user-facing assistant turn:

const assistant = generator({
  name: "assistant",
  model: "intent/chat",
  prompt: "You are a helpful assistant. Keep replies under three sentences.",
});

plan — a planner producing a list of subtasks:

const planTasks = generator({
  name: "plan-tasks",
  model: "intent/plan",
  outputSchema: z.object({
    steps: z.array(z.object({
      title: z.string(),
      rationale: z.string(),
    })),
  }),
  prompt: "Break the goal into 3-6 concrete steps with a one-line rationale each.",
});

synthesize — combining prior outputs into a structured result. This intent doubles as the structured-JSON intent; apps should point it at JSON-reliable models (Sonnet/GPT-class), not the cheapest tier.

const synthesize = generator({
  name: "synthesize-findings",
  model: "intent/synthesize",
  outputSchema: z.object({
    summary: z.string(),
    keyPoints: z.array(z.string()),
  }),
  prompt: "Combine the research and analysis into a single structured report.",
});

code — a code-review generator returning structured findings:

const reviewCode = generator({
  name: "review-code",
  model: "intent/code",
  outputSchema: z.object({
    findings: z.array(z.object({
      severity: z.enum(["info", "warn", "error"]),
      file: z.string(),
      line: z.number(),
      message: z.string(),
    })),
  }),
  prompt: "Review the diff. Flag correctness issues, not style.",
});

reason — open-ended deliberation, free-form output:

const deliberate = generator({
  name: "deliberate",
  model: "intent/reason",
  prompt: "Work through the tradeoffs out loud. End with a recommendation.",
});

Examples use intent/* strings, not the underlying provider/model. That's the point of intents: blocks declare what they need; the resolver decides which model fills the role.

A note on structured outputs: a generator's outputSchema must be compatible with OpenAI's strict structured-output mode (no open-keyed z.record(), no unions of differently-shaped variants). The framework checks this when you define the generator and throws with the offending path, so a bad schema fails immediately rather than on the first model call. See assertStrictCompatible for the details and the fixes.

Array Fallback

Don't need a named intent? Pass an array directly. The framework tries each model in order:

const chat = generator({
  name: "chat",
  model: ["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"],
  prompt: "You are a helpful assistant.",
});

This gives you the same retry-and-fallback behavior as named intents, without defining one. Useful for one-off blocks where defining an intent would be overkill.

Dynamic Selection

The model field accepts a function. It receives the block's input and context, so you can pick models based on what's happening at runtime:

const adaptive = generator({
  name: "adaptive",
  model: (input, ctx) => {
    if (input.message.length > 2000) return "intent/reason";
    return "intent/chat";
  },
  prompt: "You are a helpful assistant.",
});

The function can return any valid model value: a string, an intent string, an array, or a resolved model instance.

`selectModel`

Inline model functions work, but they tend to accumulate type casts and get hard to scan. selectModel is a declarative alternative. You give it a default and a list of rules:

import { generator, selectModel } from "@flow-state-dev/core";

const assistant = generator({
  name: "assistant",
  model: selectModel("intent/chat", [
    { preferProvider: (_input, ctx) => ctx.user?.state.preferredProvider },
  ]),
  prompt: "You are a helpful assistant.",
});

Rules are evaluated in two phases. PreferProvider rules run first and collect — every non-null result contributes a provider-name preference ("anthropic", "openai", etc.) that the resolver uses to reorder the intent's candidate list. When rules run second. Each has a boolean condition and a fixed model to use when it's true.

model: selectModel("intent/chat", [
  // Phase 1: preferProvider — provider-name preference, collected into the resolver
  { preferProvider: (_input, ctx) => ctx.user?.state.preferredProvider },

  // Phase 2: when — condition-based model swaps
  { when: (input) => input.message.length > 5000, use: "intent/reason" },
  { when: (_input, ctx) => ctx.session.state.mode === "create", use: "intent/plan" },
])

If no when matches, the default is returned. preferProvider returns a provider name (or array of names), not a model string — the two compose: when chooses the intent, preferProvider shapes ordering inside it. Both callbacks can be async.

User-Facing Model Selection

For apps where end users should control which model runs, the pattern is:

Store the user's choice in user state (persists across sessions)
Expose a flow action that updates it
Read the choice in the generator's model function

Here's the flow-level setup:

const userStateSchema = z.object({
  selectedModel: z.string().default("anthropic/claude-sonnet-4.6"),
});

const setSelectedModel = handler({
  name: "set-selected-model",
  inputSchema: z.object({ selectedModel: z.string() }),
  userStateSchema,
  execute: async (input, ctx) => {
    await ctx.user!.patchState({ selectedModel: input.selectedModel });
  },
});

const kitchenSink = defineFlow({
  kind: "my-app",
  actions: {
    chat: { block: chatPipeline, inputSchema },
    setSelectedModel: { block: setSelectedModel, inputSchema: z.object({ selectedModel: z.string() }) },
  },
  user: { stateSchema: userStateSchema },
});

On the client side, call the action when the user picks a new model. The change takes effect on the next generation — no restart needed.

await session.sendAction("setSelectedModel", { selectedModel: "anthropic/claude-opus-4.7" });

Surface the current selection through the user scope's client block so the UI stays in sync:

user: {
  stateSchema: userStateSchema,
  client: {
    derived: {
      preferences: (ctx) => ({
        selectedModel: ctx.state.selectedModel,
      }),
    },
  },
},

Gateways and fallback

Gateways let you route provider calls through a single proxy. The resolver supports gateway-prefixed model strings explicitly, and also falls back to a configured gateway when a bare provider/model can't be loaded directly. This section covers both, plus the env-var detection rules that drive provider availability.

Gateway model strings

A gateway/provider/model string routes the call through the named gateway:

const chat = generator({
  name: "chat",
  model: "vercel/openai/gpt-5.5",
  prompt: "You are a helpful assistant.",
});

const chat = generator({
  name: "chat",
  model: "openrouter/anthropic/claude-sonnet-4.6",
  prompt: "You are a helpful assistant.",
});

Gateway strings work anywhere a model string works: directly on a generator, inside selectModel, and inside intent candidate lists.

Provider detection

The resolver figures out which providers are available by checking environment variables:

Provider	Variable
Anthropic	`ANTHROPIC_API_KEY`
OpenAI	`OPENAI_API_KEY`
Google	`GOOGLE_GENERATIVE_AI_API_KEY`
Vercel Gateway	`AI_GATEWAY_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`

A gateway key makes all providers available through that gateway. Direct keys take priority over gateways when both exist for the same provider.

Zero-config setup (auto-detects from env):

import { createModelResolver } from "@flow-state-dev/core/models";

const resolver = createModelResolver();

Direct-then-gateway fallback

Bare provider/model strings (no gateway prefix) have a two-tier resolution:

Direct first. If the provider package is installed and a direct API key is configured, the resolver loads the direct provider and calls it.
Gateway fallback. If the direct package fails to load — not installed, can't be required in a bundled Next.js context, factory throws — the resolver walks configured gateways (explicit options.gateways entries first, then gateways auto-detected via env vars). The first gateway that covers this provider is used to route the call.

This is what makes "openai/gpt-5.5" keep working on Vercel even when @ai-sdk/openai isn't in the bundle, as long as AI_GATEWAY_API_KEY is set. The behavior is intentional, not a hidden quirk.

A worked resolution trace:

import { createGateway } from "@ai-sdk/gateway";
import { createModelResolver } from "@flow-state-dev/core/models";

// App config
const resolver = createModelResolver({
  gateways: { vercel: createGateway({ apiKey: process.env.AI_GATEWAY_API_KEY }) },
  defaultModel: "anthropic/claude-sonnet-4.6",
  intents: {
    utility: ["openai/gpt-5.4-nano", "anthropic/claude-haiku-4.5"],
    chat: ["openai/gpt-5.5", "anthropic/claude-sonnet-4.6"],
  },
});

// In Next.js production where @ai-sdk/openai isn't in the bundle:
generator({ name: "chat", model: "intent/chat", prompt: "..." });

// Resolution trace:
// 1. "intent/chat" → candidates: ["openai/gpt-5.5", "anthropic/claude-sonnet-4.6"]
// 2. Try "openai/gpt-5.5":
//      - Direct openai package: load fails (not in bundle)
//      - Gateway fallback: vercel gateway covers openai → use it
//      - Wraps as gateway-routed openai/gpt-5.5
// 3. Request succeeds.
//    item.model = { actual: "openai/gpt-5.5", gateway: "vercel" }

Retry and Fallback

When a model call fails:

Retryable errors (429, 500-503, network) retry the same model with exponential backoff
After exhausting retries, moves to the next model in the list
Auth errors and bad requests skip immediately to the next model
If everything fails, throws with a summary of what was tried

For streaming, fallback only works before the first chunk arrives. Once a stream starts producing tokens, a mid-stream failure propagates to the caller. There's no transparent way to resume a stream from a different model.

Configure retry behavior:

const resolver = createModelResolver({
  retryPolicy: {
    maxAttemptsPerModel: 3,  // default: 2
    baseDelayMs: 500,        // default: 1000
    maxDelayMs: 15000,       // default: 10000
  },
});

Prompt Caching

Every generator opts into prompt caching by default. For Anthropic models that means the adapter stamps providerOptions.anthropic.cacheControl on the last system message, so tools + system get cached together. OpenAI, Google, and DeepSeek cache implicitly and are left alone. OpenRouter is treated like Anthropic, since its API proxies cache_control through unchanged. If you're routing through the Vercel AI Gateway, the adapter sets providerOptions.gateway.caching: 'auto' instead and lets the gateway mark breakpoints for the underlying provider.

You don't have to configure anything to get the win. When it matters, tune it:

const chat = generator({
  name: "chat",
  model: "anthropic/claude-sonnet-4-6",
  prompt: LONG_SYSTEM_PROMPT,
  caching: {
    enabled: true,         // default true
    breakpoints: "auto",   // "auto" (default) or "manual"
    ttl: "5m",             // "5m" (default) or "1h"
  },
});

What the modes do:

Mode	Behavior
`enabled: false`	No cache markers emitted, regardless of provider.
`breakpoints: "auto"`	Adapter decides placement per provider. Skips Anthropic marking when the cacheable prefix is below ~1024 tokens (the API activation floor).
`breakpoints: "manual"`	Adapter passes your `providerOptions` through untouched. Use this when you want to place multiple breakpoints (e.g., system + end-of-history for long multi-turn agents) or different TTLs per part.

caching can be a function of (input, ctx) when the decision depends on per-call state.

Observing cache hits

The adapter threads Anthropic's cache counters into GeneratorModelUsage:

result.usage = {
  promptTokens: 1200,
  completionTokens: 48,
  totalTokens: 1248,
  cacheCreationInputTokens: 1100,  // first turn
  cacheReadInputTokens: 0,
}
// ...subsequent turn on the same stable prefix:
result.usage = {
  promptTokens: 1200,
  completionTokens: 52,
  totalTokens: 1252,
  cacheCreationInputTokens: 0,
  cacheReadInputTokens: 1100,       // ~90% cheaper than a fresh input
}

The DevTool's token usage panel surfaces the same numbers per call and aggregated per session.

Cost model in one line

Cache write is ~1.25× the input rate; cache read is ~0.1×. One read refunds the write premium. For any generator called more than once with a stable system prompt, default-on is strictly cheaper.

For a fuller treatment — including the audit of call paths that existed before default-on, the minimum-prefix threshold, and manual-mode placement patterns — see docs/PROMPT_CACHING.md.

Thinking and reasoning

Different providers expose "thinking" or "reasoning" in different shapes. The framework doesn't normalize them yet (see the note at the end of this section), but it does two things that make the surface usable today: it streams reasoning output as items automatically, and it passes providerOptions straight through to the underlying AI SDK provider.

Streaming reasoning output

Whenever the resolved model produces reasoning chunks, the generator emits them as ReasoningItem items on the stream. No configuration needed. See streaming/items.md for the item shape.

The `providerOptions` escape hatch

To turn thinking on for Anthropic, set a budget on providerOptions.anthropic.thinking:

const reasoner = generator({
  name: "reasoner",
  model: "anthropic/claude-opus-4.7",
  providerOptions: { anthropic: { thinking: { budgetTokens: 10000 } } },
  prompt: "Work through the problem step by step.",
});

For OpenAI, use reasoning_effort:

const reasoner = generator({
  name: "reasoner",
  model: "openai/gpt-5.5",
  providerOptions: { openai: { reasoning_effort: "high" } },
  prompt: "Work through the problem step by step.",
});

For Google, use thinkingConfig:

const reasoner = generator({
  name: "reasoner",
  model: "google/gemini-3.1-pro",
  providerOptions: { google: { thinkingConfig: { thinkingBudget: 8000 } } },
  prompt: "Work through the problem step by step.",
});

The shapes above match what the AI SDK accepts for each provider. Verify against the SDK docs if you're targeting a newer model; the field names occasionally shift.

Intent defaults for thinking

When you want every generator that resolves through intent/plan (or any thinking-shaped intent) to send a budget without spelling it out at each call site, attach it to the intent itself via intentDefaults:

const resolver = createModelResolver({
  defaultModel: "openai/gpt-5.4",
  intents: {
    plan: ["anthropic/claude-opus-4.7", "openai/gpt-5.5"],
  },
  intentDefaults: {
    plan: {
      providerOptions: {
        anthropic: { thinking: { type: "enabled", budgetTokens: 16000 } },
      },
    },
  },
});

When Anthropic wins resolution, the thinking budget is applied. When the fallback runs the OpenAI candidate instead, the anthropic.* block is dropped — only the resolved provider's keys travel with the request. A generator that sets its own providerOptions at the call site still wins on key collisions.

See Custom Model Resolver for the full configuration surface.

Forward note: normalized reasoning levels

A normalized reasoning: 'low' | 'medium' | 'high' level — orthogonal to model choice, with per-model clamp behavior — is on the roadmap but not yet implemented. Until it ships, the providerOptions escape hatch above is the supported surface.

Observable model identity

When a generator runs, the resolved model identity flows out on every emitted item (message, reasoning, source, tool_output, and the transient tool_call_progress) and on the generator's block_trace. The shape is the same in both places:

type ModelIdentity = {
  actual: string;       // the concrete model that ran
  requested?: string;   // present when different (intents, fallback, substitution)
  gateway?: string;     // present when a gateway routed the call
};

A chat UI can read this directly from any message item to render a per-message model badge:

import { ModelBadge } from "@flow-state-dev/react";

<ModelBadge model={item.model} />

Items emitted by handlers do not carry model. See streaming/items.md for the full surface, including block_trace.model semantics.

Reading the resolved model at completion time

Items on the stream carry model, but sometimes you need to write the resolved identity into server-side state as the turn completes. Generators pass it to onCompleted as a third meta argument:

const responder = generator({
  name: "responder",
  model: "intent/chat",
  prompt: "You are a helpful assistant.",
  itemVisibility: { client: true, history: true },
  onCompleted: async (output, ctx, meta) => {
    // Write the concrete model into session state for display or audit.
    await ctx.session.patchState({ lastModel: meta.model.actual });

    // Detect fallback: requested differs from actual when an intent
    // resolved to a different model or the provider substituted.
    if (meta.model.requested && meta.model.requested !== meta.model.actual) {
      await ctx.request.patchState({ fellBack: true });
    }
  },
});

meta.model is a ModelIdentity — the same shape stamped on emitted items. It is always populated on the success path (the generator seeds it before calling the model). The same value can be written to clientData if you need it on the client.

This is the supported path for completion-time state projection. The meta argument is typed per block kind: only generators receive GeneratorCompletedMeta today. Handlers, sequencers, and routers keep the existing two-argument onCompleted(output, ctx) signature.

For the general hook mechanism, see lifecycle hooks. For the per-item form, see streaming/items.md.

Migration from presets

The preset/* API was removed as part of the intents rollout. Any preset/* string now throws at construction time with this mapping:

preset/fast, preset/tiny, preset/small  → intent/utility
preset/medium                           → intent/chat
preset/large                            → intent/code or intent/reason
preset/thinking-*                       → intent/reason or intent/plan
                                          with reasoning enabled (FIX-517)

The runtime error still references FIX-517 (normalized reasoning levels). That feature was deferred and never shipped; for callers migrating from preset/thinking-*, see Thinking and reasoning above for the providerOptions escape hatch that ships today.

Model Strings​

Intents​

Canonical intent names​

Configuring intents​

Env-var overrides​

Worked examples​

Array Fallback​

Dynamic Selection​

selectModel​

User-Facing Model Selection​

Gateways and fallback​

Gateway model strings​

Provider detection​

Direct-then-gateway fallback​

Retry and Fallback​

Prompt Caching​

Observing cache hits​

Cost model in one line​

Thinking and reasoning​

Streaming reasoning output​

The providerOptions escape hatch​

Intent defaults for thinking​

Forward note: normalized reasoning levels​

Observable model identity​

Reading the resolved model at completion time​

Migration from presets​

What to Read Next​