Fetch
@flow-state-dev/tools — Fetch a single web page and return its content as clean, LLM-ready markdown.
Why this exists
Agents need to read web pages. Documentation, articles, user-shared links, search results worth reading in full. The raw HTML is noisy and wastes context tokens. tools.fetch handles the fetching, content extraction, and HTML-to-markdown conversion so you don't have to.
Three providers, auto-selected by what's available:
| Provider | How it works | JS rendering | Anti-bot | Env var |
|---|---|---|---|---|
| Firecrawl | Managed API, best quality | Yes | Yes | FIRECRAWL_API_KEY |
| Jina Reader | HTTP API via r.jina.ai | Yes (ReaderLM) | Partial | JINA_API_KEY (optional) |
| Built-in | Node.js fetch + Readability + Turndown | No | No | None needed |
The built-in fallback always works. No API keys, no external services. It handles static HTML well — documentation sites, blog posts, articles. For JS-rendered SPAs or pages behind anti-bot protection, you'll want Firecrawl or Jina.
Basic usage
import { generator } from "@flow-state-dev/core";
import { tools } from "@flow-state-dev/tools";
const reader = generator({
name: "reader",
model: "anthropic/claude-sonnet-4-6",
prompt: "Read URLs the user provides and summarize them.",
tools: [tools.fetch()],
});
The LLM calls fetch with a URL and gets back markdown content, a title, and metadata.
Configuration
tools.fetch({
// Force a specific provider instead of auto-detection
provider: "firecrawl", // "firecrawl" | "jina" | "builtin"
// Enable JS rendering (Firecrawl and Jina only)
waitForJS: true,
// Explicit API keys (overrides env vars)
keys: {
firecrawl: "fc-...",
jina: "jina_...",
},
})
All options are optional. With no config, the tool auto-detects providers from environment variables and falls back to built-in.
Provider resolution
The tool checks for available providers in this order:
FIRECRAWL_API_KEY set? → Firecrawl (deterministic, JS rendering, anti-bot)
JINA_API_KEY set? → Jina Reader (deterministic, ReaderLM markdown)
Always → Built-in (static HTML only, but always works)
Unlike tools.search(), fetch never throws "no provider available". The built-in fallback covers the zero-config case.
Output shape
Every provider returns the same normalized result:
{
url: "https://example.com/article",
title: "Article Title",
markdown: "# Article Title\n\nClean markdown content...",
metadata: {
statusCode: 200,
contentType: "text/html",
description: "Meta description if available",
publishedDate: "2026-01-15",
wordCount: 1247,
},
source: "firecrawl" // which provider was used
}
How the built-in fallback works
The built-in provider uses a three-step pipeline:
- Fetch — standard
fetch()with a browser-like User-Agent - Extract —
@mozilla/readabilitystrips navigation, ads, sidebars, and boilerplate, keeping just the article content (same library behind Firefox Reader View) - Convert —
turndownconverts the cleaned HTML to markdown with ATX-style headings and fenced code blocks
This is the same pipeline Jina Reader uses internally. The difference is Jina also handles JavaScript-rendered pages via their ReaderLM model.
Direct provider constructors
If you want to skip auto-detection and lock to a specific provider:
import { firecrawlFetch, jinaFetch, builtinFetch } from "@flow-state-dev/tools";
// Always use Firecrawl (throws if no API key)
const fetch = firecrawlFetch({ keys: { firecrawl: "fc-..." } });
// Always use Jina (works without key at 20 RPM)
const fetch = jinaFetch();
// Always use built-in (never calls external services)
const fetch = builtinFetch();
Composing with search
A natural pattern: search first, then fetch the best results for full content.
const researcher = generator({
name: "researcher",
model: "anthropic/claude-sonnet-4-6",
prompt: "Search for information, then read the most relevant pages to give thorough answers.",
tools: [tools.search(), tools.fetch()],
});
The LLM will call search, scan the snippets, then fetch the pages that look most useful. You don't need to wire this up — the LLM figures out the workflow.
Error handling
| Scenario | Behavior |
|---|---|
| URL returns 404/500 | Throws an error. Generator retry handles transient failures. |
| URL redirects | Follows redirects automatically (standard fetch behavior) |
| Page has no readable content | Returns best-effort markdown (raw body conversion if Readability can't extract an article) |
| Firecrawl API error | Throws with the Firecrawl error message. Generator can retry. |
| Jina rate limited (429) | Throws. Generator retry will back off. |
Next steps
- Crawl tool — for multi-page site crawling
- Tools overview — all available tools