
Structured Outputs: Why JSON Schema Discipline Changes Everything
Structured Outputs: Why JSON Schema Discipline Changes Everything
Series 2 — AI Architecture Patterns · Post 2 of 6
Here's a bug that shows up in almost every early-stage AI integration: the model returns something almost right. The field is there, but it's a string instead of a number. The array exists, but it has one extra nested object nobody expected. The enum value is "Positive" instead of "positive" and your switch statement silently falls through to the default case.
None of these are model failures in any deep sense — the model understood the task. They're interface failures, and they're entirely preventable. The fix isn't prompt engineering. It's schema enforcement.
This post covers how we handle structured outputs at EVERSCAPE LABS across our TypeScript and Python stack, and why getting this right early is one of the highest-leverage things you can do for a production AI system.
flowchart TD
Q1{"❶ AMBIGUÏTÉ<br>Chaque étape est<br>prévisible ?"}
Q1 -- "Oui → Workflow" --> Q2
Q1 -- "Non → Agent" --> Q2
Q2{"❷ ERREURS<br>Une erreur a un<br>impact direct ?"}
Q2 -- "Oui → Workflow" --> Q3
Q2 -- "Non → Agent" --> Q3
Q3{"❸ MÉMOIRE<br>Doit se souvenir<br>des runs précédents ?"}
Q3 -- "Non → Workflow" --> Q4
Q3 -- "Oui → Agent" --> Q4
Q4{"❹ FRÉQUENCE<br>Milliers de<br>runs par jour ?"}
Q4 -- "Oui → Workflow" --> R
Q4 -- "Non → Agent" --> R
R{"Comptez vos signaux"}
R -- "3+ Workflow" --> WF["Workflow déterministe"]
R -- "3+ Agent" --> AG["Agent ou hybride"]
R -- "2 et 2" --> HY["Hybride"]Why "Just Ask It to Return JSON" Isn't Enough
The naive approach — adding "respond only with valid JSON" to your system prompt — works maybe 95% of the time. In development, that feels fine. In production at scale, 5% failure means one in twenty requests breaks. If those requests are user-facing, that's not a bug rate, that's a product problem.
The failure modes are subtle and varied:
- The model returns valid JSON wrapped in a markdown code block (
```json ... ```), because it was trained to be readable - A field that should be
nullcomes back as the string"null" - Numeric confidence scores come back as
"0.87"instead of0.87 - An optional field is omitted entirely when the model isn't confident, crashing downstream code that doesn't handle undefined
- The model "explains itself" with a sentence before the JSON block, breaking the parser
Each of these is individually fixable with a one-off patch. But you'll be patching forever. The right answer is to enforce the schema at the boundary — before the output ever reaches your business logic.
Two Levels of Defense
We use a two-layer approach:
Layer 1 — Model-native schema enforcement. The model API itself constrains the output format. This prevents the majority of structural issues at generation time.
Layer 2 — Runtime validation with Zod or Pydantic. Even with model-native enforcement, we validate every response before it touches application code. This catches semantic issues (a value in the right format but the wrong range), handles API changes gracefully, and gives us typed, safe objects to work with downstream.
Think of it like defense in depth: Layer 1 raises the floor, Layer 2 catches what slips through.
Layer 1: Model-Native Structured Output
Both the Anthropic API (via tool use) and OpenAI's API (via response_format) support schema-constrained generation. The model is steered to produce output that matches your schema — not just asked to.
With Anthropic, the cleanest way to enforce structure is to wrap your desired output in a tool definition. The model is instructed to call that tool with its result, which forces it to conform to your input schema:
// structured-call.ts
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Define the expected output as a tool the model must "call"
const outputSchema: Anthropic.Tool = {
name: "submit_analysis",
description: "Submit the final structured analysis result",
input_schema: {
type: "object",
properties: {
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"],
description: "Overall sentiment of the text",
},
confidence: {
type: "number",
description: "Confidence score between 0 and 1",
},
key_topics: {
type: "array",
items: { type: "string" },
description: "Main topics identified in the text",
},
requires_escalation: {
type: "boolean",
description: "Whether this item needs human review",
},
},
required: ["sentiment", "confidence", "key_topics", "requires_escalation"],
},
};
async function analyzeText(text: string) {
const response = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 1024,
tools: [outputSchema],
// Force the model to use this specific tool
tool_choice: { type: "tool", name: "submit_analysis" },
system: "You are a text analysis assistant. Analyze the provided text and submit your findings using the submit_analysis tool.",
messages: [{ role: "user", content: text }],
});
// Extract the tool call input — this is guaranteed to match the schema
const toolUse = response.content.find((block) => block.type === "tool_use");
if (!toolUse || toolUse.type !== "tool_use") {
throw new Error("Model did not call the expected tool");
}
return toolUse.input; // Structurally valid, but not yet typed
}The tool_choice: { type: "tool", name: "submit_analysis" } is the key line. It tells the model it must call this specific tool — no prose, no markdown, no explanatory preamble. Just the structured call.
Layer 2: Runtime Validation with Zod
The tool call input is structurally valid JSON, but it's typed as unknown in TypeScript. We don't trust it yet — we parse it through Zod to get a fully typed, validated object:
// schemas.ts
import { z } from "zod";
export const AnalysisResultSchema = z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
confidence: z.number().min(0).max(1),
key_topics: z.array(z.string()).min(1),
requires_escalation: z.boolean(),
});
export type AnalysisResult = z.infer<typeof AnalysisResultSchema>;// structured-call.ts (continued)
import { AnalysisResultSchema, type AnalysisResult } from "./schemas";
async function analyzeTextSafe(text: string): Promise<AnalysisResult> {
const rawOutput = await analyzeText(text);
// Parse and validate — throws a descriptive ZodError if anything is wrong
const result = AnalysisResultSchema.parse(rawOutput);
return result; // Fully typed, fully validated
}Now result is a proper AnalysisResult type. TypeScript knows it. Your IDE autocompletes it. Any downstream code that receives it can be written without defensive checks — the contract is enforced at the boundary.
If something is wrong, ZodError gives you a precise, field-level error message:
ZodError: [
{
"code": "invalid_enum_value",
"options": ["positive", "negative", "neutral"],
"path": ["sentiment"],
"message": "Invalid enum value. Expected 'positive' | 'negative' | 'neutral', received 'Positive'"
}
]
That's a debuggable error. Compare it to a silent type coercion or a null pointer exception two layers deep in your business logic.
The Python Side: Pydantic Does the Same Job
In Python, Pydantic is the equivalent — and when paired with the Anthropic SDK, it integrates cleanly:
# schemas.py
from pydantic import BaseModel, Field, field_validator
from typing import Literal
from enum import Enum
class Sentiment(str, Enum):
positive = "positive"
negative = "negative"
neutral = "neutral"
class AnalysisResult(BaseModel):
sentiment: Sentiment
confidence: float = Field(ge=0.0, le=1.0)
key_topics: list[str] = Field(min_length=1)
requires_escalation: bool
@field_validator("key_topics")
@classmethod
def topics_not_empty_strings(cls, v: list[str]) -> list[str]:
if any(topic.strip() == "" for topic in v):
raise ValueError("key_topics must not contain empty strings")
return v# analyzer.py
import json
from anthropic import Anthropic
from pydantic import ValidationError
from schemas import AnalysisResult
client = Anthropic()
OUTPUT_TOOL = {
"name": "submit_analysis",
"description": "Submit the final structured analysis result",
"input_schema": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
},
"confidence": {"type": "number"},
"key_topics": {
"type": "array",
"items": {"type": "string"},
},
"requires_escalation": {"type": "boolean"},
},
"required": ["sentiment", "confidence", "key_topics", "requires_escalation"],
},
}
def analyze_text(text: str) -> AnalysisResult:
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=[OUTPUT_TOOL],
tool_choice={"type": "tool", "name": "submit_analysis"},
system="You are a text analysis assistant. Analyze the provided text and submit your findings using the submit_analysis tool.",
messages=[{"role": "user", "content": text}],
)
tool_use = next(
(block for block in response.content if block.type == "tool_use"), None
)
if not tool_use:
raise RuntimeError("Model did not call the expected tool")
try:
# Pydantic validates and coerces in one step
return AnalysisResult(**tool_use.input)
except ValidationError as e:
raise RuntimeError(f"Model output failed validation: {e}") from eNotice the @field_validator on key_topics — this is where Pydantic earns its keep beyond basic type checking. You can encode business rules directly in the schema: no empty strings, arrays must have at least one item, confidence must be between 0 and 1. These rules live with the schema definition, not scattered across your codebase as defensive checks.
Keeping the Schema in One Place
One trap to avoid: duplicating your schema between the tool definition JSON and your Zod/Pydantic model. They drift. Someone updates one and forgets the other, and you get subtle inconsistencies that are annoying to track down.
In TypeScript, you can generate the JSON schema from Zod directly:
// schema-sync.ts
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const AnalysisResultSchema = z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
confidence: z.number().min(0).max(1),
key_topics: z.array(z.string()).min(1),
requires_escalation: z.boolean(),
});
// Generate the JSON schema for the tool definition — single source of truth
const jsonSchema = zodToJsonSchema(AnalysisResultSchema, {
name: "AnalysisResult",
$refStrategy: "none", // Flatten refs for Anthropic compatibility
});
const outputTool: Anthropic.Tool = {
name: "submit_analysis",
description: "Submit the final structured analysis result",
input_schema: jsonSchema as Anthropic.Tool["input_schema"],
};Now your Zod schema is the single source of truth. The tool definition derives from it, not the other way around.
In Python, do the same with Pydantic's model_json_schema():
# Single source of truth
tool_schema = AnalysisResult.model_json_schema()
OUTPUT_TOOL = {
"name": "submit_analysis",
"description": "Submit the final structured analysis result",
"input_schema": tool_schema,
}What Breaks Without This
In a real project — a customer feedback processing pipeline — we inherited a codebase that used raw prompt-based JSON requests with no schema enforcement. The symptoms were classic:
- A nightly batch job failed silently 3% of the time because
confidencecame back as a string - An escalation workflow had a bug that only appeared on certain input text lengths, because the model occasionally omitted
requires_escalationentirely, and the downstream code treatedundefinedas falsy - A frontend dashboard showed
"Neutral"(capital N) for some records and"neutral"for others, because the rendering code had been patched to handle both after a production incident
All three issues vanished after we introduced schema enforcement at the boundary. Not because the model started returning better output, but because bad output now threw an explicit, catchable error at the point of entry instead of propagating silently through the system.
The Pattern, Summarized
- Define your schema once in Zod or Pydantic — this is the source of truth
- Generate the tool definition from the schema programmatically
- Use
tool_choiceto force the model to call that tool — no prose, no fallbacks - Parse the tool input through your schema library — throws immediately on invalid output
- Pass typed objects to the rest of your application — no defensive checks needed downstream
This adds maybe 30 lines to your integration. It eliminates an entire category of production incidents.
Next in this series: Post 3 — The Context Window Is Your Most Precious Resource — where we cover memory management strategies: sliding windows, summarization loops, and external retrieval, with code for each.
EVERSCAPE LABS builds reliable AI systems for companies that can't afford unreliable ones. If you're evaluating an AI workflow project, get in touch.
Want to Learn More?
We help organizations harness the power of AI. Let's discuss how we can work together.
Get in Touch