TypeScript browser agent library. AI drives the browser once — replay the navigation with zero LLM tokens.
@lightfeed/browser-agent is a TypeScript browser agent library built for cutting LLM token use on every rerun.
Most browser-agent work has two parts:
- Navigation — many clicks / types / scrolls to reach a target page. Most of the steps, most of the tokens, usually the same every run if the page structure is stable. Today's agents pay for these tokens every single time.
- Extraction — pull typed data out of whatever is on screen. Must re-run AI each time because the content is live.
This library lets you run navigation once with AI, save it as a plan, and replay it with zero LLM calls — no screenshots, no DOM map, no tokens. Then run a cheap .extract() on the result page for the dynamic tail. If the DOM drifts, optional aiFallback re-plans only the broken step, so you still pay tokens for a fraction of the flow instead of all of it.
Runs anywhere your browser lives — the same BrowserAgent API drives a local Chromium for dev, a serverless Chromium (AWS Lambda via @sparticuz/chromium) for scheduled jobs, or a remote CDP endpoint (Brightdata Scraping Browser, any browser farm, or your own). Swap backends by changing one config field; prompts, plans, and .extract() calls stay identical.
npm install @lightfeed/browser-agentGo to the Hacker News Show section, click through to the next page, and grab the top 3 posts. Navigation (open Show, paginate) is the expensive-but-stable part; extraction is the live-data part.
import { BrowserAgent } from "@lightfeed/browser-agent";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { z } from "zod";
const agent = new BrowserAgent({
browserProvider: "Local",
llm: new ChatGoogleGenerativeAI({ model: "gemini-2.5-flash" }),
});
const page = await agent.newPage();
// 1. AI navigation — recordable, replayable.
const nav = await page.ai(
"Go to Hacker News show section, go to next page"
);
await agent.savePlan("hn show page 2", nav, "./hn.plan.json");
// 2. AI extraction — typed by a Zod schema, runs AI every call.
const { articles } = await page.extract(
"The top 3 articles on this page",
z.object({
articles: z
.array(
z.object({
title: z.string(),
url: z.string(),
points: z.number(),
commentsUrl: z.string(),
})
)
.max(3),
})
);Every subsequent run — navigation is free:
await agent.replay("./hn.plan.json", { page }); // zero tokens
const { articles } = await page.extract(/* ... */); // tokens only hereEverything above is available without writing code:
# Record while running
browser-agent-cli run --save-plan ./hn.plan.json \
-c "Go to Hacker News show section, go to next page and find top 3 articles"
# Replay: deterministic navigation (no LLM), then one fresh AI pass on the
# result page to produce an up-to-date final response. The navigation part
# is free; only the final pass spends tokens.
browser-agent-cli replay ./hn.plan.json
# Pure replay — skip the final AI pass and just get the browser onto the
# result page (zero LLM calls end-to-end).
browser-agent-cli replay ./hn.plan.json --no-ai-finish
# Use a different finishing task (e.g. ask for a custom summary of the
# current page instead of re-running the recorded task).
browser-agent-cli replay ./hn.plan.json \
--finish-task "Return the titles of the first 3 posts as a bullet list"
# Self-heal drifted steps during replay (independent of the finish pass).
browser-agent-cli replay ./hn.plan.json --ai-fallback
LLM auto-detected from GOOGLE_API_KEY / GEMINI_API_KEY → OPENAI_API_KEY → ANTHROPIC_API_KEY. Override the model with --llm-model or GEMINI_MODEL / OPENAI_MODEL / ANTHROPIC_MODEL. replay only needs an LLM with --ai-fallback. Interactive: ctrl+p pause, ctrl+r resume.
The same BrowserAgent API works against three backends.
Local — your machine, for dev and prototyping
const agent = new BrowserAgent({ browserProvider: "Local" });Remote — any CDP-over-WebSocket endpoint (Brightdata Scraping Browser, browser farms, your own)
const agent = new BrowserAgent({
browserProvider: "Remote",
remoteConfig: {
browserWSEndpoint: "ws://your-remote-browser:9222/devtools/browser/ws",
},
});Serverless — AWS Lambda etc. via @sparticuz/chromium
Version pinning: This project uses Playwright, which ships with a specific version of Chromium. You need a matching
@sparticuz/chromium. We're on Playwright 1.49 (Chromium 133), so install@sparticuz/chromium@133. For AWS Lambda, ARM64 is preferred; you also need thecanvasnative dependencies — seelambda-layer-build.sh.
import { BrowserAgent } from "@lightfeed/browser-agent";
import chromium from "@sparticuz/chromium";
const agent = new BrowserAgent({
browserProvider: "Serverless",
serverlessConfig: {
executablePath: await chromium.executablePath(),
options: { args: chromium.args },
},
});All three drive the browser with AI, return the same TaskOutput, and can be recorded + replayed.
| API | Use when |
|---|---|
page.ai(task) |
You already have a page and want to mix Playwright calls (page.goto, page.clickElement) with AI steps on the same tab. Resolves when done. |
agent.executeTask(task) |
"Here's a goal, figure it out." The agent owns the page; include URLs in the prompt and it navigates itself. Resolves when done. |
agent.executeTaskAsync(task) |
Same as executeTask but returns a Task control handle immediately — task.pause(), task.resume(), task.cancel(), and per-step event callbacks. For long-running flows, CLIs, or anything a user can interrupt. |
agent.savePlan(task, result, path)writes a JSON plan with the action sequence and a stablexpath+cssPathfor each clicked / typed element.agent.replay(path, { page })re-runs those actions with no LLM calls, no screenshots, no DOM map.aiFallback: truere-plans only a drifted step with the LLM; the rest stays free.startingUrl(option, or--urlon the CLI) retargets a plan at a different URL — useful for staging / preview deploys / different queries.- Plans are human-readable and hand-editable (tweak an
inputTextvalue, reorder or delete steps).
The
outputstring the model produced while recording is frozen in the plan — the programmaticagent.replay()does not regenerate it. The CLI'sreplaycommand, by default, runs one fresh AI pass (page.ai(plan.task, { maxSteps: 3 })) on the result page after navigation so every CLI run ends with an up-to-date response; pass--no-ai-finishto get pure token-free replay and fall back to the recorded output. If you're wiring this up programmatically, run your own.extract()/.ai()on the page afteragent.replay()instead of relying on the recordedoutput.
MIT. Forked from HyperAgent (b49afe). Serverless browser support by @sparticuz/chromium.