English 简体中文繁體中文 Español 日本語 한국어 Français Português

ビルド設計図

How to Build an Agentic Browser

最初に製品の形を決めます。優れたチームは最初からフルブラウザを作りません。計画、実行、記憶、人間の制御を最小構成で証明します。

短い答え

agentic browser を作るには、プランナー、ブラウザ実行環境、持続コンテキスト、承認レールを組み合わせ、その上でプロトタイプ、拡張機能、ブラウザネイティブのどれを作るか決めます。

最初のビルドルートを選ぶ

向いている用途

調査、運用、マルチタブ統合。

最初に作るもの

記憶、承認、タスク状態をプリミティブとして設計します。

最小スタック

ブラウザ shell + memory + planner + routing

このページはまずビルドマップとして機能します。

選択中の設計

実用的な agentic browser には4つの責務があります

どのルートでも、目標理解、セッション内実行、文脈保持、Web 変化からの回復ができて初めて単なる自動化を超えます。

省略しない項目

Task state that survives more than one page

A runtime that can observe and act, not only scrape

Approval points before sensitive steps

Memory that stores findings, not just raw logs

コアアーキテクチャ

本格実装では5つのブロックがほぼ必ず現れます

使用技術は変わっても、おもちゃのデモを超えると構成パターンはかなり安定します。

Model router

Use a fast model for page reads and a stronger model for planning, critique, or high-risk decisions.

Planner

Turn a user goal into ordered sub-steps, then keep updating the plan as the browser state changes.

Browser runtime

Read the DOM, inspect page state, click, type, navigate, and capture evidence from the live session.

Memory

Store task state, extracted facts, and open questions so the agent does not restart on every tab.

Approval and recovery

Pause before risky actions, detect failures, and offer a clear retry path when the page changes.

実装順序

4段階で出荷する

Make one workflow useful

Pick a narrow task such as compare three vendors, collect fields from forms, or summarize a tab set.

Stabilize actions

Add retries, page checks, screenshots, and action logs before you expand to more tasks.

Add persistent context

Save state across tabs and sessions so the agent can continue work instead of starting over.

Design the browser-native UX

Expose task history, approvals, and memory where the user already works, not in a detached debug panel.

選択マトリクス

プロトタイプ vs 拡張機能 vs ネイティブ workspace

Decision

プロトタイプ

拡張機能

ネイティブ workspace

Fastest to validate

プロトタイプExcellent

拡張機能Good

ネイティブ workspaceSlowest

Cross-tab context

プロトタイプLimited

拡張機能Medium

ネイティブ workspaceBest

Trust and approvals

プロトタイプManual

拡張機能Patchy

ネイティブ workspaceProduct-level

Long-term differentiation

プロトタイプLow

拡張機能Medium

ネイティブ workspaceHighest

Tabbit が示すこと

Tabbit はブラウザネイティブルートの参照点です

難しいのはクリックさせることではなく、タスク、文脈、承認をブラウジング体験に自然に埋め込むことです。

Task-first browsing

The browsing surface is organized around work, not around isolated prompts.

Multi-tab context

Context follows the workflow, so research and synthesis can span more than one page.

Agent UX, not plugin UX

The agent is part of the browsing environment instead of sitting beside it as a bolt-on.

FAQ

最初に出る質問

What is the fastest way to build an agentic browser?

Start with one workflow on top of an automation runtime, then add memory and approval points before you widen the scope.

Should I build a browser extension or a full browser?

Use an extension if you need page assistance inside an existing browser. Build a browser-native workspace if long-running tasks and cross-tab context are the product.

What makes a browser agent different from browser automation?

Automation runs fixed instructions. A browser agent interprets goals, updates plans from live page state, and carries task memory across steps.

Where does memory matter most?

Memory matters when the task spans several tabs, several minutes, or several checkpoints that require human review.

Next step

基盤を作り、次に実製品を観察する

ブラウザネイティブ路線が製品でどう見えるか知りたいなら Tabbit を見てください。