Single Agent vs. Multi-Agent: When the Complexity Is Worth It
Single Agent vs. Multi-Agent: When the Complexity Is Worth It
Series 2 — AI Architecture Patterns · Post 1 of 6
Every AI project reaches the same fork in the road, usually around the third week when things start getting complicated: do we keep everything in one agent, or do we split it up?
The wrong answer is almost always the same: teams reach for multi-agent architecture because it feels more powerful, more scalable, more like the thing they read about on Hacker News. Then they spend two sprints debugging message passing between agents that could have been a single well-structured prompt.
This post gives you the framework we use at EVERSCAPE LABS to make that call — and the code to back it up either way.
The Problem With "More Agents = More Power"
Multi-agent systems are genuinely powerful. They can parallelize work, isolate concerns, and handle complexity that would overwhelm a single context window. But they come with real costs:
- Latency compounds. Every agent hop adds round-trip time. A 3-agent pipeline where each step takes 2s isn't 2s total — it's 6s minimum, plus orchestration overhead.
- Failure surfaces multiply. A single agent can fail in one way. Three agents can fail in seven: each individually, any handoff between them, and the orchestrator itself.
- Debugging becomes archaeology. When something goes wrong in a multi-agent system, reconstructing what each agent saw and decided requires deliberate logging infrastructure. It doesn't come for free.
None of this means you should avoid multi-agent systems. It means you should earn them.
The Decision Framework
We ask three questions before splitting into multiple agents:
1. Does the task have genuinely independent subtasks?
"Independent" means: subtask A does not need the output of subtask B to proceed, and they don't share mutable state. If your subtasks are sequential and each depends on the last, you don't have a parallelization opportunity — you have a pipeline, and a single agent with a structured prompt handles that more reliably.
Multi-agent fits: Analyzing 50 customer support tickets simultaneously, each independent.
Single agent fits: Researching a topic, then summarizing it, then drafting a report.
2. Does any subtask exceed a single context window?
This is the honest technical forcing function. If processing all your data at once would blow past 200K tokens, you need to chunk it — and agents are a natural boundary for chunks. But if everything fits comfortably, adding agents is overhead without benefit.
3. Do different subtasks require fundamentally different capabilities or tools?
An agent that needs to browse the web, write code, and query a database is doing three different things with three different risk profiles. Separating these into specialized agents with scoped tool access is good systems design — it's the principle of least privilege applied to AI.
If the answer to all three is "no", start with a single agent. You can always extract later.
Single Agent: The Right Starting Point
Here's a production-grade single-agent pattern in TypeScript. This is not a toy — it includes tool calling, retry logic, and structured output validation.
// agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
const client = new Anthropic();
// Define the tools this agent can use
const tools: Anthropic.Tool[] = [
{
name: "search_documents",
description: "Search internal knowledge base for relevant documents",
input_schema: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
limit: { type: "number", description: "Max results to return", default: 5 },
},
required: ["query"],
},
},
{
name: "create_summary",
description: "Persist a structured summary to the database",
input_schema: {
type: "object",
properties: {
title: { type: "string" },
key_points: { type: "array", items: { type: "string" } },
confidence: { type: "number", minimum: 0, maximum: 1 },
},
required: ["title", "key_points", "confidence"],
},
},
];
// Structured output schema — enforced after the agent finishes
const AgentResultSchema = z.object({
summary_id: z.string(),
topics_covered: z.array(z.string()),
gaps_identified: z.array(z.string()),
});
type AgentResult = z.infer<typeof AgentResultSchema>;
async function runAgent(userQuery: string): Promise<AgentResult> {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userQuery },
];
// Agentic loop — continues until the model stops calling tools
while (true) {
const response = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 4096,
tools,
messages,
system: `You are a research assistant. Use the available tools to answer
the user's query thoroughly. Always create a summary when you
have enough information. Be explicit about gaps in your knowledge.`,
});
// Append assistant turn to message history
messages.push({ role: "assistant", content: response.content });
// If no more tool calls, we're done
if (response.stop_reason === "end_turn") {
break;
}
// Process tool calls
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type !== "tool_use") continue;
const result = await executeTool(block.name, block.input as Record<string, unknown>);
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: JSON.stringify(result),
});
}
// Feed results back into the loop
messages.push({ role: "user", content: toolResults });
}
// Extract and validate the final structured result
const lastMessage = messages[messages.length - 1];
const rawOutput = extractJSON(lastMessage);
return AgentResultSchema.parse(rawOutput);
}
// Stub — in production this dispatches to your actual tool implementations
async function executeTool(name: string, input: Record<string, unknown>): Promise<unknown> {
console.log(`[tool] ${name}`, input);
// Replace with real implementations
return { status: "ok", data: [] };
}
function extractJSON(message: Anthropic.MessageParam): unknown {
const content = Array.isArray(message.content)
? message.content.find((b) => b.type === "text")?.text ?? ""
: message.content;
const match = content.match(/```json\n([\s\S]+?)\n```/);
if (!match) throw new Error("No JSON block found in agent response");
return JSON.parse(match[1]);
}Notice what's happening here: the agent manages its own tool loop, we enforce a schema on output (more on this in Post 2), and the calling code doesn't need to know how the agent reached its answer — just that the answer is valid.
This pattern handles the vast majority of real-world tasks.
Multi-Agent: When You've Earned It
Now let's look at a scenario where multi-agent architecture genuinely earns its complexity: processing a large batch of documents in parallel, where each document is independent and the total token volume exceeds a single context.
Here's the orchestrator pattern in TypeScript, with a Python worker for heavy document processing:
// orchestrator.ts
import { Worker } from "worker_threads";
import path from "path";
interface DocumentJob {
id: string;
content: string;
metadata: Record<string, unknown>;
}
interface DocumentResult {
id: string;
entities: string[];
sentiment: "positive" | "negative" | "neutral";
summary: string;
}
const MAX_CONCURRENT_AGENTS = 5; // Respect API rate limits
async function processDocumentBatch(
documents: DocumentJob[]
): Promise<DocumentResult[]> {
const results: DocumentResult[] = [];
const queue = [...documents];
const inFlight: Promise<DocumentResult>[] = [];
while (queue.length > 0 || inFlight.length > 0) {
// Fill up to the concurrency limit
while (queue.length > 0 && inFlight.length < MAX_CONCURRENT_AGENTS) {
const job = queue.shift()!;
const promise = processDocument(job).then((result) => {
// Remove from in-flight when done
inFlight.splice(inFlight.indexOf(promise), 1);
return result;
});
inFlight.push(promise);
}
// Wait for at least one to finish before looping
if (inFlight.length > 0) {
const result = await Promise.race(inFlight);
results.push(result);
}
}
return results;
}
async function processDocument(job: DocumentJob): Promise<DocumentResult> {
// Each document gets its own isolated agent call
// In production this might be a separate process, Lambda, or worker thread
const response = await fetch("http://localhost:8000/process", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(job),
});
if (!response.ok) {
throw new Error(`Agent failed for document ${job.id}: ${response.statusText}`);
}
return response.json() as Promise<DocumentResult>;
}The Python worker that processDocument calls into — a FastAPI service where each request runs its own agent:
# worker.py
from fastapi import FastAPI, HTTPException
from anthropic import Anthropic
from pydantic import BaseModel
from typing import Literal
import json
app = FastAPI()
client = Anthropic()
class DocumentJob(BaseModel):
id: str
content: str
metadata: dict
class DocumentResult(BaseModel):
id: str
entities: list[str]
sentiment: Literal["positive", "negative", "neutral"]
summary: str
@app.post("/process", response_model=DocumentResult)
async def process_document(job: DocumentJob) -> DocumentResult:
"""Each HTTP request is an isolated agent. No shared state."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system="""You are a document analysis agent. Analyze the given document and
return a JSON object with exactly these fields:
- entities: list of named entities (people, orgs, places)
- sentiment: one of "positive", "negative", "neutral"
- summary: one sentence summary
Respond ONLY with valid JSON. No prose.""",
messages=[
{
"role": "user",
"content": f"Document ID: {job.id}\n\n{job.content}"
}
],
)
raw = response.content[0].text.strip()
try:
data = json.loads(raw)
return DocumentResult(id=job.id, **data)
except (json.JSONDecodeError, KeyError) as e:
raise HTTPException(
status_code=500,
detail=f"Agent returned invalid output for document {job.id}: {e}"
)Spin up the worker service with this one-liner:
# Start the document processing worker
uvicorn worker:app --host 0.0.0.0 --port 8000 --workers 4
# Or with Docker for isolation (recommended in production)
docker run -d \
--name doc-processor \
-p 8000:8000 \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
--restart unless-stopped \
everscape/doc-processor:latestThe key architectural decision here: each worker instance is stateless. It receives a document, runs one agent call, returns a result. No shared memory, no message bus complexity. The orchestrator owns state; the workers are pure functions.
The Comparison in Numbers
We ran both approaches on a batch of 200 internal documents (~800 tokens each) in a client engagement:
| Metric | Single Agent (sequential) | Multi-Agent (5 workers) |
|---|---|---|
| Total time | 8m 42s | 1m 54s |
| Cost | $0.34 | $0.37 |
| Failure rate | 0% | 2.5% (retried successfully) |
| Debug time on failure | ~2 min | ~8 min |
The multi-agent system was 4.5x faster at the cost of slightly higher failure surface. For a batch job that runs overnight, that tradeoff is worth it. For an interactive user-facing query, it probably isn't — 9 minutes vs 2 minutes doesn't matter if the user is waiting for 3 seconds either way.
Cost difference was negligible. This is almost always true: the model inference cost is the same regardless of whether calls happen sequentially or in parallel. You're paying for tokens, not topology.
What We Got Wrong the First Time
The first version of our multi-agent batch processor used a message queue (Redis Streams) to coordinate between agents. It felt like the right architecture — decoupled, scalable, resilient. It was also completely unnecessary for our use case and added two weeks of infrastructure work.
The HTTP worker pattern above achieves the same parallelism with a fraction of the complexity. We only reach for a message queue now when we need guaranteed delivery, replay capability, or agents that need to communicate with each other (not just report back to an orchestrator).
The lesson: match your infrastructure to your actual failure modes, not imagined ones.
The Rule of Thumb
Start with one agent. Run it in production. When it breaks in a specific, diagnosable way — context overflow, unacceptable latency, a subtask that needs dedicated tools — extract that piece into a second agent. Repeat.
This is boring advice. It's also consistently right.
The best multi-agent systems we've built weren't designed that way from day one. They grew into it because the problem demanded it, and each new agent solved a concrete pain point with a measurable improvement.
Next in this series: Post 2 — Structured Outputs: Why JSON Schema Discipline Changes Everything — where we go deep on enforcing output schemas with Zod and Pydantic, and why this single practice eliminates an entire category of production bugs.
EVERSCAPE LABS builds reliable AI systems for companies that can't afford unreliable ones. If you're evaluating an AI workflow project, get in touch.
Want to Learn More?
We help organizations harness the power of AI. Let's discuss how we can work together.
Get in Touch