PLAN v0.1 · CLOSED ALPHA · Q3 2026

Goals,
decomposed.
Safely.

The planning layer for autonomous agents. Turns goals into typed DAGs of tool calls. Every node carries a risk tier, a precondition, and a rollback. Replans on failure. Grounds in Recall.

Request access →See synthesis ↩ The stack

4 tiers

Risk-typed nodes

DAG

Not a chain

MCP-native

Tool protocol

plan.synthesize · session_3f2aSYNTHESIZING

GOALSchedule a call with Casey next week

NODES

PARALLEL

GATES

MAX TIER

IRREV

EST · LAT

1.4s

EST · COST

$0.0021

▲ ARC LABS·Plan composes on top ofRecall·Q3 2026·See the cognitive stack →

01 Why Plan

ReAct loops are guess-and-pray.

PROBLEM 01

Linear chains, exponential failure.

ReAct/CoT agents call one tool at a time, then re-prompt with the result. Each step is a fresh LLM gamble. A 12-step task at 92% per-step accuracy lands at 37% end-to-end. Nothing about the chain composes — every step inherits all prior uncertainty.

agent.step_1  → 0.92
agent.step_12 → 0.37
// 0.92¹² ≈ 0.37

PROBLEM 02

No notion of risk.

The agent treats db.read and stripe.charge the same way. Irreversible actions trigger at the same confidence threshold as queries. There is no policy on which steps require review — the framework can't even express the question.

send_invoice(amount=99999) ✓
// no gate, no rollback, no log

PROBLEM 03

Failures restart everything.

When step 9 of 12 fails — API timeout, schema drift, rate limit — most agents either crash or re-run the whole prompt. There's no plan structure to mutate. No concept of "redo just this subgraph with fresh context and the rest of the work intact."

retry → full prompt replay
cost ×12, latency ×12,
wasted progress 8 nodes

02 Synthesis

From goal
to typed DAG.

Plan synthesis is a five-stage compiler. A goal becomes a directed acyclic graph of typed nodes — each with a precondition, a risk tier, an expected tool, and a rollback. Recall provides the grounding context. The plan is not text. It is structure you can inspect, diff, and version.

Goal parseNL → INTENT

Goal string → typed Intent record. Extracts the verb (do / find / decide), the object, the constraints, and the success predicate. Ambiguous goals are rejected before any tool call — the planner refuses to synthesize what it can't measure.

model haiku-4.5

latency ~400ms

output Intent{}

refusal ambiguous

GroundRECALL CONTEXT

Pulls relevant memories from Recall — user preferences, prior decisions, entity facts, last-N tool outcomes. Synthesis is grounded in what the agent already knows. Planning blind on unfamiliar context is the single biggest cause of bad plans.

source recall.read

budget 8 facts

cost ~$0.0002

cache 60s LRU

DecomposeINTENT → DAG

Recursive goal-decomposition until every leaf maps to a single tool. Edges encode data dependency, not just order — siblings can run in parallel. Cycles are rejected. Depth is bounded; over-decomposition surfaces as a synthesis error.

model sonnet-4.5

depth ≤ 6

parallel auto

cycles = error

TypeRISK + ROLLBACK

Each node is assigned a risk tier (READ / WRITE / IRREVERSIBLE / HUMAN-IN-LOOP) and a rollback action where one exists. The policy engine attaches gates — confidence thresholds, dollar limits, approval requirements. Tiering happens before binding, so policy can refuse a plan.

tiers 4

policy declarative

gates auto

rollback per-tool

BindNODE → MCP TOOL

Each leaf is bound to a concrete MCP tool. Parameters are resolved from upstream node outputs and Recall context. Type-checked against the tool schema. Missing parameters surface as planning errors before execution starts — never as runtime null-arg crashes.

protocol MCP

resolution static + LLM

unbound = error

schema JSONSchema

03 Anatomy

Every node is eleven fields.

A Plan node is not a string. It's a typed record. Eleven fields describe it; every one of them is queryable, diffable, and persisted with the receipt.

node /n5WRITE

id"n5"unique · stable

tool"draft.message"mcp · resolved

args{ recipient, slots, tone }refs upstream

tierWRITEpolicy-checked

deps["n1", "n2", "n4"]data-edges

precondslots.length > 0checked at run

rollback{ tool: draft.delete }reversible

gate{ conf: 0.85 }tier default

REVERSIBILITY

Every WRITE declares its undo. If the tool can't define a rollback, the planner promotes the node to IRREV.

PARAM RESOLUTION

$n1.id is a reference, not a substitution. The executor passes the actual upstream value at runtime.

IDEMPOTENCY

Every node carries an idempotency key. The executor short-circuits on replay if the key has been seen.

04 Risk tiers

Not every step
is equal.

Plan classifies every node into one of four tiers, each with its own confidence requirement and policy.

TIER 01

Read

REVERSIBLE · NO SIDE EFFECT

Queries, retrievals, computations. No external mutation. The cheapest tier — runs autonomously.

recall.search

db.select

http.get

vec.knn

▸ AUTO · NO GATE

TIER 02

Write

REVERSIBLE · INTERNAL

Internal state mutations. The agent's own database, the user's draft folder. Reversible via undo.

recall.write

draft.save

fs.write

vec.upsert

▸ CONFIDENCE ≥ 0.85

TIER 03

Irreversible

EXTERNAL · NO UNDO

Sends an email, deploys a build, charges a card. No rollback. Policy-gated and opt-in.

email.send

stripe.charge

db.delete

deploy.push

▸ POLICY-GATED

TIER 04

Human

REVIEW · APPROVAL

Pauses the plan. Surfaces the node, its inputs, and predicted effect. Blocking.

approve.send

approve.merge

approve.charge

approve.deploy

▸ HUMAN GATE · BLOCKING

05 Tool binding

Plans are code,
not prose.

A Plan is a serializable object. Every node is bound to an MCP tool with concrete parameters. The agent doesn't "decide what to do next" — it executes a graph it can prove is well-formed.

PLAN OBJECT · TYPED

// goal: "schedule a call with Casey next week"
Plan {
  id: "plan_3f2a",
  goal: "schedule call with Casey",
  nodes: [
    { id: "n1", tool: "recall.search",
      args: { q: "Casey" },
      tier: READ, deps: [] },
    { id: "n2", tool: "calendar.free",
      args: { who: "$n1.id" },
      tier: READ, deps: ["n1"] },
    { id: "n3", tool: "email.send",
      args: { body: "$n2" },
      tier: IRREV,
      gate: { approval: true },
      deps: ["n2"] }
  ]
}

EXECUTION TRACE · LIVE

recall.search(q: "Casey")

Returned 3 entities. Best match: person/casey-park

DONE · 142ms · READ

calendar.free(who: casey-park, +7d)

4 free 30-min slots. Tue 2pm, Wed 10am, Thu 3pm, Fri 11am.

DONE · 318ms · READ

email.send(body: $n2)

Blocked on human gate. Will surface preview when N2 completes.

PENDING GATE · IRREV

06 Execution

Parallel where it can,
serial where it must.

The DAG is the schedule. Nodes with disjoint dependency closures run in parallel automatically.

plan.execute · plan_3f2a · 6 nodes · 4 parallelRUNNING

n1recall.searchREAD

240ms

10.0%

n2calendar.freeREAD

420ms

17.5%

n3verify.budgetREAD

180ms

7.5%

n4filter.slotsREAD

90ms

3.8%

n5draft.messageWRITE

1180ms

49.2%

n6email.sendIRREV

220ms ⚐

9.2%

WALL CLOCK

1.96s

SERIAL EQUIV.

2.33s

PARALLEL UPSIDE

−16%

TOTAL COST

$0.0017

07 Replanning

Failure is a signal,
not a stack trace.

When a node fails, the planner doesn't crash and doesn't replay the whole prompt. It mutates the affected subgraph in place and resumes. Upstream work is preserved. Downstream work is invalidated and re-typed.

retrieve

READ · ✓

filter

READ · ✓

api.call

503 · ✗

REPLAN

N3 → N3'

fallback bound

N3'

api.call(v2)

RUNNING

approve

QUEUED

SCOPE

Subgraph diff, not full replay. Only nodes whose dependency closure includes the failed node get re-typed.

COST

−84%

vs replay

Average reduction in re-execution cost across the bench suite.

LATENCY

−71%

vs replay

Time-to-recover from a single mid-plan failure.

08 Reliability

The numbers we're chasing.

Plan v0.1 alpha targets below. Numbers are from our internal bench suite — 240 multi-tool agentic tasks.

END-TO-END TASK ACCURACY

% tasks completed without operator intervention

Plan v0.1

78%

LangGraph

54%

AutoGen

46%

ReAct

31%

240 task suite · sonnet-4.5 · n=5 runs · jan 2026

UNSAFE-ACTION RATE

% runs that fired an irreversible tool without gate

Plan v0.1

0.4%

LangGraph

7.1%

AutoGen

11.3%

ReAct

17.8%

tier classification + gate enforcement · same task set

09 The stack

Same DNA
as Recall.

Plan ships under the same engineering principles as the rest of the Arc Labs cognitive stack. Rust core. Open core.

Rust core

PLANNER + EXECUTOR

The synthesis compiler, the DAG executor, the policy engine, the rollback machine. Single binary. Zero-allocation in the hot path.

crate: arc-plan | size: 8.2 MB | deps: 14

Recall integration

CONTEXT GROUNDING

Plan is not useful without memory. Synthesis pulls from Recall for grounding; execution writes outcomes back.

protocol: in-process | overhead: ~0ms

MCP tool layer

ANY TOOL, TYPED

Tools are MCP servers. Plan reads their schemas and validates parameter resolution at synthesis time.

protocol: MCP | tools: any | schema: JSONSchema

Policy engine

DECLARATIVE GATES

Policies are configuration, not prompts. A YAML file declares which tiers gate, dollar limits, and approval requirements.

format: YAML / TOML | live: hot-reload

SDKs

TS · PYTHON · GO

First-class clients in Node, Python, and Go. Inspect, edit, replay plans from any runtime.

node: @arc-labs/plan | py: arc-plan | go: arclabs.ai/plan

10 Comparison

Plan vs.
the alternatives.

Most "agent frameworks" ship orchestration code, not a planner. Plan is the missing primitive — it's what should sit between LangGraph and your tools.

Feature	Plan	LangGraph	AutoGen	ReAct
Plan structure how steps relate	Typed DAG	Hand-coded graph	Conversation loop	Linear chain
Risk tiers node-level policy	4 tiers, declarative	None	None	None
Replan on failure subgraph-level	In-place mutation	Manual edge re-route	Re-prompt loop	Restart prompt
Memory grounding context at synthesis	Recall-native	BYO retriever	BYO retriever	None
Human gates approval flow	Built-in, typed	Custom node	Custom message	None

11 Early access

Q3 2026.
Closed alpha.

Plan v0.1 ships to ~40 design partners building agent products in production. If you have a real workload — not a demo — we'd like to hear from you. Tell us your tools, your tiers, and the one task that has to ship.

Apply for alpha →↩ The stack

import { Planner } from "@arc-labs/plan";
import { Recall }  from "@arc-labs/recall";

const planner = new Planner({
  recall: new Recall(),
  tools:  mcpFleet,
  policy: "./policy.yaml",
});

const plan = await planner.synthesize({
  goal: "schedule call with Casey next week",
});

// inspect it before you run it
console.log(plan.nodes);
await planner.execute(plan);

Goals,decomposed.Safely.