● Q3 2026·Plan v0.1 — Closed Alpha·Request Access →
PLAN v0.1 · CLOSED ALPHA · Q3 2026

Goals,
decomposed.
Safely.

The planning layer for autonomous agents. Turns goals into typed DAGs of tool calls. Every node carries a risk tier, a precondition, and a rollback. Replans on failure. Grounds in Recall.

4 tiers
Risk-typed nodes
DAG
Not a chain
MCP-native
Tool protocol
plan.synthesize · session_3f2aSYNTHESIZING
GOALSchedule a call with Casey next week
goalROOT · recall.searchREAD · recallcalendar.freeREAD · mcp/calverify.budgetREAD · recallfilter.slotsREAD · puredraft.messageWRITE · sonnetemail.sendIRREV · mcp/mail
NODES
7
PARALLEL
2
GATES
1
MAX TIER
IRREV
EST · LAT
1.4s
EST · COST
$0.0021
▲ ARC LABS·Plan composes on top ofRecall·Q3 2026·See the cognitive stack →
01 Why Plan

ReAct loops are guess-and-pray.

PROBLEM 01

Linear chains, exponential failure.

ReAct/CoT agents call one tool at a time, then re-prompt with the result. Each step is a fresh LLM gamble. A 12-step task at 92% per-step accuracy lands at 37% end-to-end. Nothing about the chain composes — every step inherits all prior uncertainty.

agent.step_1  → 0.92
agent.step_12 → 0.37
// 0.92¹² ≈ 0.37
PROBLEM 02

No notion of risk.

The agent treats db.read and stripe.charge the same way. Irreversible actions trigger at the same confidence threshold as queries. There is no policy on which steps require review — the framework can't even express the question.

send_invoice(amount=99999) ✓
// no gate, no rollback, no log
PROBLEM 03

Failures restart everything.

When step 9 of 12 fails — API timeout, schema drift, rate limit — most agents either crash or re-run the whole prompt. There's no plan structure to mutate. No concept of "redo just this subgraph with fresh context and the rest of the work intact."

retry → full prompt replay
cost ×12, latency ×12,
wasted progress 8 nodes
02 Synthesis

From goal
to typed DAG.

Plan synthesis is a five-stage compiler. A goal becomes a directed acyclic graph of typed nodes — each with a precondition, a risk tier, an expected tool, and a rollback. Recall provides the grounding context. The plan is not text. It is structure you can inspect, diff, and version.

01

Goal parseNL → INTENT

Goal string → typed Intent record. Extracts the verb (do / find / decide), the object, the constraints, and the success predicate. Ambiguous goals are rejected before any tool call — the planner refuses to synthesize what it can't measure.

model haiku-4.5
latency ~400ms
output Intent{}
refusal ambiguous
02

GroundRECALL CONTEXT

Pulls relevant memories from Recall — user preferences, prior decisions, entity facts, last-N tool outcomes. Synthesis is grounded in what the agent already knows. Planning blind on unfamiliar context is the single biggest cause of bad plans.

source recall.read
budget 8 facts
cost ~$0.0002
cache 60s LRU
03

DecomposeINTENT → DAG

Recursive goal-decomposition until every leaf maps to a single tool. Edges encode data dependency, not just order — siblings can run in parallel. Cycles are rejected. Depth is bounded; over-decomposition surfaces as a synthesis error.

model sonnet-4.5
depth ≤ 6
parallel auto
cycles = error
04

TypeRISK + ROLLBACK

Each node is assigned a risk tier (READ / WRITE / IRREVERSIBLE / HUMAN-IN-LOOP) and a rollback action where one exists. The policy engine attaches gates — confidence thresholds, dollar limits, approval requirements. Tiering happens before binding, so policy can refuse a plan.

tiers 4
policy declarative
gates auto
rollback per-tool
05

BindNODE → MCP TOOL

Each leaf is bound to a concrete MCP tool. Parameters are resolved from upstream node outputs and Recall context. Type-checked against the tool schema. Missing parameters surface as planning errors before execution starts — never as runtime null-arg crashes.

protocol MCP
resolution static + LLM
unbound = error
schema JSONSchema
03 Anatomy

Every node is eleven fields.

A Plan node is not a string. It's a typed record. Eleven fields describe it; every one of them is queryable, diffable, and persisted with the receipt.

node /n5WRITE
id"n5"unique · stable
tool"draft.message"mcp · resolved
args{ recipient, slots, tone }refs upstream
tierWRITEpolicy-checked
deps["n1", "n2", "n4"]data-edges
precondslots.length > 0checked at run
rollback{ tool: draft.delete }reversible
gate{ conf: 0.85 }tier default

REVERSIBILITY

Every WRITE declares its undo. If the tool can't define a rollback, the planner promotes the node to IRREV.

PARAM RESOLUTION

$n1.id is a reference, not a substitution. The executor passes the actual upstream value at runtime.

IDEMPOTENCY

Every node carries an idempotency key. The executor short-circuits on replay if the key has been seen.

04 Risk tiers

Not every step
is equal.

Plan classifies every node into one of four tiers, each with its own confidence requirement and policy.

TIER 01

Read

REVERSIBLE · NO SIDE EFFECT

Queries, retrievals, computations. No external mutation. The cheapest tier — runs autonomously.

recall.search
db.select
http.get
vec.knn
▸ AUTO · NO GATE
TIER 02

Write

REVERSIBLE · INTERNAL

Internal state mutations. The agent's own database, the user's draft folder. Reversible via undo.

recall.write
draft.save
fs.write
vec.upsert
▸ CONFIDENCE ≥ 0.85
TIER 03

Irreversible

EXTERNAL · NO UNDO

Sends an email, deploys a build, charges a card. No rollback. Policy-gated and opt-in.

email.send
stripe.charge
db.delete
deploy.push
▸ POLICY-GATED
TIER 04

Human

REVIEW · APPROVAL

Pauses the plan. Surfaces the node, its inputs, and predicted effect. Blocking.

approve.send
approve.merge
approve.charge
approve.deploy
▸ HUMAN GATE · BLOCKING
05 Tool binding

Plans are code,
not prose.

A Plan is a serializable object. Every node is bound to an MCP tool with concrete parameters. The agent doesn't "decide what to do next" — it executes a graph it can prove is well-formed.

PLAN OBJECT · TYPED
// goal: "schedule a call with Casey next week"
Plan {
  id: "plan_3f2a",
  goal: "schedule call with Casey",
  nodes: [
    { id: "n1", tool: "recall.search",
      args: { q: "Casey" },
      tier: READ, deps: [] },
    { id: "n2", tool: "calendar.free",
      args: { who: "$n1.id" },
      tier: READ, deps: ["n1"] },
    { id: "n3", tool: "email.send",
      args: { body: "$n2" },
      tier: IRREV,
      gate: { approval: true },
      deps: ["n2"] }
  ]
}
EXECUTION TRACE · LIVE
N1
recall.search(q: "Casey")

Returned 3 entities. Best match: person/casey-park

DONE · 142ms · READ
N2
calendar.free(who: casey-park, +7d)

4 free 30-min slots. Tue 2pm, Wed 10am, Thu 3pm, Fri 11am.

DONE · 318ms · READ
N3
email.send(body: $n2)

Blocked on human gate. Will surface preview when N2 completes.

PENDING GATE · IRREV
06 Execution

Parallel where it can,
serial where it must.

The DAG is the schedule. Nodes with disjoint dependency closures run in parallel automatically.

plan.execute · plan_3f2a · 6 nodes · 4 parallelRUNNING
n1recall.searchREAD
240ms
10.0%
n2calendar.freeREAD
420ms
17.5%
n3verify.budgetREAD
180ms
7.5%
n4filter.slotsREAD
90ms
3.8%
n5draft.messageWRITE
1180ms
49.2%
n6email.sendIRREV
220ms
9.2%
WALL CLOCK
1.96s
SERIAL EQUIV.
2.33s
PARALLEL UPSIDE
−16%
TOTAL COST
$0.0017
07 Replanning

Failure is a signal,
not a stack trace.

When a node fails, the planner doesn't crash and doesn't replay the whole prompt. It mutates the affected subgraph in place and resumes. Upstream work is preserved. Downstream work is invalidated and re-typed.

N1
retrieve
READ · ✓
N2
filter
READ · ✓
N3
api.call
503 · ✗
REPLAN
N3 → N3'
fallback bound
N3'
api.call(v2)
RUNNING
N4
approve
QUEUED
SCOPE

Subgraph diff, not full replay. Only nodes whose dependency closure includes the failed node get re-typed.

COST
−84%
vs replay

Average reduction in re-execution cost across the bench suite.

LATENCY
−71%
vs replay

Time-to-recover from a single mid-plan failure.

08 Reliability

The numbers we're chasing.

Plan v0.1 alpha targets below. Numbers are from our internal bench suite — 240 multi-tool agentic tasks.

END-TO-END TASK ACCURACY

% tasks completed without operator intervention

Plan v0.1
78%
LangGraph
54%
AutoGen
46%
ReAct
31%
240 task suite · sonnet-4.5 · n=5 runs · jan 2026
UNSAFE-ACTION RATE

% runs that fired an irreversible tool without gate

Plan v0.1
0.4%
LangGraph
7.1%
AutoGen
11.3%
ReAct
17.8%
tier classification + gate enforcement · same task set
09 The stack

Same DNA
as Recall.

Plan ships under the same engineering principles as the rest of the Arc Labs cognitive stack. Rust core. Open core.

L1
Rust core
PLANNER + EXECUTOR

The synthesis compiler, the DAG executor, the policy engine, the rollback machine. Single binary. Zero-allocation in the hot path.

crate: arc-plan | size: 8.2 MB | deps: 14
L2
Recall integration
CONTEXT GROUNDING

Plan is not useful without memory. Synthesis pulls from Recall for grounding; execution writes outcomes back.

protocol: in-process | overhead: ~0ms
L3
MCP tool layer
ANY TOOL, TYPED

Tools are MCP servers. Plan reads their schemas and validates parameter resolution at synthesis time.

protocol: MCP | tools: any | schema: JSONSchema
L4
Policy engine
DECLARATIVE GATES

Policies are configuration, not prompts. A YAML file declares which tiers gate, dollar limits, and approval requirements.

format: YAML / TOML | live: hot-reload
L5
SDKs
TS · PYTHON · GO

First-class clients in Node, Python, and Go. Inspect, edit, replay plans from any runtime.

node: @arc-labs/plan | py: arc-plan | go: arclabs.ai/plan
10 Comparison

Plan vs.
the alternatives.

Most "agent frameworks" ship orchestration code, not a planner. Plan is the missing primitive — it's what should sit between LangGraph and your tools.

FeaturePlanLangGraphAutoGenReAct
Plan structure
how steps relate
Typed DAGHand-coded graphConversation loopLinear chain
Risk tiers
node-level policy
4 tiers, declarativeNoneNoneNone
Replan on failure
subgraph-level
In-place mutationManual edge re-routeRe-prompt loopRestart prompt
Memory grounding
context at synthesis
Recall-nativeBYO retrieverBYO retrieverNone
Human gates
approval flow
Built-in, typedCustom nodeCustom messageNone
11 Early access

Q3 2026.
Closed alpha.

Plan v0.1 ships to ~40 design partners building agent products in production. If you have a real workload — not a demo — we'd like to hear from you. Tell us your tools, your tiers, and the one task that has to ship.

import { Planner } from "@arc-labs/plan";
import { Recall }  from "@arc-labs/recall";

const planner = new Planner({
  recall: new Recall(),
  tools:  mcpFleet,
  policy: "./policy.yaml",
});

const plan = await planner.synthesize({
  goal: "schedule call with Casey next week",
});

// inspect it before you run it
console.log(plan.nodes);
await planner.execute(plan);