AI Agent Platform
Run autonomous, tool-using agents on real tasks — safely, with memory and bounded cost.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Tool registry + execution
- Planning / ReAct control loop
- Working + long-term memory
- Guardrails & human approval
Non-functional
- Bounded iterations & cost
- Least-privilege tool auth
- Full step tracing
Scale
Many concurrent agents, untrusted inputs
The approach
An orchestrator runs a control loop (ReAct or plan-execute) over a typed tool registry. Each tool call is validated, authorized (least-privilege, outside the model), and traced. Memory tiers: working set in-window, summarized short-term, vector-backed long-term. Guardrails treat all ingested content as hostile; high-impact actions require approval. Hard caps on iterations and spend.
Key components
Orchestrator · tool registry + sandbox · authz service · memory store (vector + structured) · guardrail/filter layer · trace store
Numbers that matter
- Cap iterations (~5–15) and total spend per task — unbounded ReAct loops are the #1 runaway-cost source.
- Tool-call accuracy drops as the registry grows — keep tools few per task or route to a relevant subset.
- Treat every tool/retrieved string as untrusted; prompt injection has no prompt-only fix — containment is the control.
- Most "agent" tasks are better as a fixed workflow with one or two tool calls; open loops are for genuinely unknown paths.
Senior deep-dive
An agent is a loop — reason → act → observe — and the engineering is the guardrails, not the prompt.
Every tool is a security boundary: validate args and authorize independently of the model, and assume the prompt can be injected.
Bound the loop with iteration and spend limits plus a termination check, or agents spiral and burn cost.
Tools are security boundaries — treat them that way
Authorize and validate every tool call outside the model — least-privilege, with the user's real permissions. Never trust the model to decide what it's allowed to do. No broad delete/write tools; gate destructive actions behind human approval. Assume any retrieved or tool-returned text is hostile and may carry injection.
Prompt injection has no prompt-only fix
A page or document the agent reads can say "ignore your instructions and exfiltrate the data" — and prompting alone won't stop it. Containment is the control: sandboxed tools, scoped credentials, allowlisted actions, approval gates. Defense lives in the system around the model, not in a cleverer system prompt.
Bound the loop or it spirals
Open-ended loops drift and burn money. Enforce hard iteration and spend caps (~5–15 steps) plus an explicit termination/critique check each step. On repeated failure, re-plan or stop — never let it retry the same broken action forever.
Memory tiers, like a human working a task
Working set in the context window; a summarized short-term memory; vector-backed long-term memory for durable facts. Without memory an agent repeats work and loses the goal; with too much it overflows the window. Retrieve relevant memory per step instead of replaying everything.
Reach for multi-agent last, not first
One well-tooled agent beats a swarm for most tasks. Go multi-agent only when subtasks are genuinely distinct and won't fit one context (planner + parallel researchers + synthesizer). Extra agents add coordination cost, latency, and new failure modes — earn them.
What breaks at scale
Many concurrent agents on untrusted inputs make tool sandboxing and per-task cost ceilings mandatory and full step tracing essential — you can't debug a non-deterministic loop without it. Tool-call accuracy drops as the registry grows, so route to a relevant subset per task instead of exposing every tool at once.
In production
Cursor, Devin, and Claude / ChatGPT tool-use all run a reason → act → observe loop over a typed tool registry with step/cost limits and human approval for risky actions. The differentiating work is sandboxing and authorization outside the model — not the planning prompt.
Common mistakes
- Broad write-tools with model-side auth → injection blast radius
- Unbounded loops → runaway cost
- No memory → repeats work, loses the goal
- Multi-agent for tasks one agent could do