Structured Outputs: Why JSON Schema Discipline Changes Everything

Series 2 — AI Architecture Patterns · Post 2 of 6

Here's a bug that shows up in almost every early-stage AI integration: the model returns something almost right. The field is there, but it's a string instead of a number. The array exists, but it has one extra nested object nobody expected. The enum value is "Positive" instead of "positive" and your switch statement silently falls through to the default case.

None of these are model failures in any deep sense — the model understood the task. They're interface failures, and they're entirely preventable. The fix isn't prompt engineering. It's schema enforcement.

This post covers how we handle structured outputs at EVERSCAPE LABS across our TypeScript and Python stack, and why getting this right early is one of the highest-leverage things you can do for a production AI system.

flowchart TD
    Q1{"❶ AMBIGUÏTÉ<br>Chaque étape est<br>prévisible ?"}
    Q1 -- "Oui → Workflow" --> Q2
    Q1 -- "Non → Agent" --> Q2

    Q2{"❷ ERREURS<br>Une erreur a un<br>impact direct ?"}
    Q2 -- "Oui → Workflow" --> Q3
    Q2 -- "Non → Agent" --> Q3

    Q3{"❸ MÉMOIRE<br>Doit se souvenir<br>des runs précédents ?"}
    Q3 -- "Non → Workflow" --> Q4
    Q3 -- "Oui → Agent" --> Q4

    Q4{"❹ FRÉQUENCE<br>Milliers de<br>runs par jour ?"}
    Q4 -- "Oui → Workflow" --> R
    Q4 -- "Non → Agent" --> R

    R{"Comptez vos signaux"}
    R -- "3+ Workflow" --> WF["Workflow déterministe"]
    R -- "3+ Agent" --> AG["Agent ou hybride"]
    R -- "2 et 2" --> HY["Hybride"]

Why "Just Ask It to Return JSON" Isn't Enough

The naive approach — adding "respond only with valid JSON" to your system prompt — works maybe 95% of the time. In development, that feels fine. In production at scale, 5% failure means one in twenty requests breaks. If those requests are user-facing, that's not a bug rate, that's a product problem.

The failure modes are subtle and varied:

The model returns valid JSON wrapped in a markdown code block (```json ... ```), because it was trained to be readable
A field that should be null comes back as the string "null"
Numeric confidence scores come back as "0.87" instead of 0.87
An optional field is omitted entirely when the model isn't confident, crashing downstream code that doesn't handle undefined
The model "explains itself" with a sentence before the JSON block, breaking the parser

Each of these is individually fixable with a one-off patch. But you'll be patching forever. The right answer is to enforce the schema at the boundary — before the output ever reaches your business logic.

Two Levels of Defense

We use a two-layer approach:

Layer 1 — Model-native schema enforcement. The model API itself constrains the output format. This prevents the majority of structural issues at generation time.

Layer 2 — Runtime validation with Zod or Pydantic. Even with model-native enforcement, we validate every response before it touches application code. This catches semantic issues (a value in the right format but the wrong range), handles API changes gracefully, and gives us typed, safe objects to work with downstream.

Think of it like defense in depth: Layer 1 raises the floor, Layer 2 catches what slips through.

Layer 1: Model-Native Structured Output

Both the Anthropic API (via tool use) and OpenAI's API (via response_format) support schema-constrained generation. The model is steered to produce output that matches your schema — not just asked to.

With Anthropic, the cleanest way to enforce structure is to wrap your desired output in a tool definition. The model is instructed to call that tool with its result, which forces it to conform to your input schema:

typescript

// structured-call.ts
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Define the expected output as a tool the model must "call"
const outputSchema: Anthropic.Tool = {
  name: "submit_analysis",
  description: "Submit the final structured analysis result",
  input_schema: {
    type: "object",
    properties: {
      sentiment: {
        type: "string",
        enum: ["positive", "negative", "neutral"],
        description: "Overall sentiment of the text",
      },
      confidence: {
        type: "number",
        description: "Confidence score between 0 and 1",
      },
      key_topics: {
        type: "array",
        items: { type: "string" },
        description: "Main topics identified in the text",
      },
      requires_escalation: {
        type: "boolean",
        description: "Whether this item needs human review",
      },
    },
    required: ["sentiment", "confidence", "key_topics", "requires_escalation"],
  },
};

async function analyzeText(text: string) {
  const response = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 1024,
    tools: [outputSchema],
    // Force the model to use this specific tool
    tool_choice: { type: "tool", name: "submit_analysis" },
    system: "You are a text analysis assistant. Analyze the provided text and submit your findings using the submit_analysis tool.",
    messages: [{ role: "user", content: text }],
  });

  // Extract the tool call input — this is guaranteed to match the schema
  const toolUse = response.content.find((block) => block.type === "tool_use");
  if (!toolUse || toolUse.type !== "tool_use") {
    throw new Error("Model did not call the expected tool");
  }

  return toolUse.input; // Structurally valid, but not yet typed
}

The tool_choice: { type: "tool", name: "submit_analysis" } is the key line. It tells the model it must call this specific tool — no prose, no markdown, no explanatory preamble. Just the structured call.

Layer 2: Runtime Validation with Zod

The tool call input is structurally valid JSON, but it's typed as unknown in TypeScript. We don't trust it yet — we parse it through Zod to get a fully typed, validated object:

typescript

// schemas.ts
import { z } from "zod";

export const AnalysisResultSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  confidence: z.number().min(0).max(1),
  key_topics: z.array(z.string()).min(1),
  requires_escalation: z.boolean(),
});

export type AnalysisResult = z.infer<typeof AnalysisResultSchema>;

typescript

// structured-call.ts (continued)
import { AnalysisResultSchema, type AnalysisResult } from "./schemas";

async function analyzeTextSafe(text: string): Promise<AnalysisResult> {
  const rawOutput = await analyzeText(text);

  // Parse and validate — throws a descriptive ZodError if anything is wrong
  const result = AnalysisResultSchema.parse(rawOutput);

  return result; // Fully typed, fully validated
}

Now result is a proper AnalysisResult type. TypeScript knows it. Your IDE autocompletes it. Any downstream code that receives it can be written without defensive checks — the contract is enforced at the boundary.

If something is wrong, ZodError gives you a precise, field-level error message:

ZodError: [
  {
    "code": "invalid_enum_value",
    "options": ["positive", "negative", "neutral"],
    "path": ["sentiment"],
    "message": "Invalid enum value. Expected 'positive' | 'negative' | 'neutral', received 'Positive'"
  }
]

That's a debuggable error. Compare it to a silent type coercion or a null pointer exception two layers deep in your business logic.

The Python Side: Pydantic Does the Same Job

In Python, Pydantic is the equivalent — and when paired with the Anthropic SDK, it integrates cleanly:

python

# schemas.py
from pydantic import BaseModel, Field, field_validator
from typing import Literal
from enum import Enum


class Sentiment(str, Enum):
    positive = "positive"
    negative = "negative"
    neutral = "neutral"


class AnalysisResult(BaseModel):
    sentiment: Sentiment
    confidence: float = Field(ge=0.0, le=1.0)
    key_topics: list[str] = Field(min_length=1)
    requires_escalation: bool

    @field_validator("key_topics")
    @classmethod
    def topics_not_empty_strings(cls, v: list[str]) -> list[str]:
        if any(topic.strip() == "" for topic in v):
            raise ValueError("key_topics must not contain empty strings")
        return v

python

# analyzer.py
import json
from anthropic import Anthropic
from pydantic import ValidationError
from schemas import AnalysisResult

client = Anthropic()

OUTPUT_TOOL = {
    "name": "submit_analysis",
    "description": "Submit the final structured analysis result",
    "input_schema": {
        "type": "object",
        "properties": {
            "sentiment": {
                "type": "string",
                "enum": ["positive", "negative", "neutral"],
            },
            "confidence": {"type": "number"},
            "key_topics": {
                "type": "array",
                "items": {"type": "string"},
            },
            "requires_escalation": {"type": "boolean"},
        },
        "required": ["sentiment", "confidence", "key_topics", "requires_escalation"],
    },
}


def analyze_text(text: str) -> AnalysisResult:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=[OUTPUT_TOOL],
        tool_choice={"type": "tool", "name": "submit_analysis"},
        system="You are a text analysis assistant. Analyze the provided text and submit your findings using the submit_analysis tool.",
        messages=[{"role": "user", "content": text}],
    )

    tool_use = next(
        (block for block in response.content if block.type == "tool_use"), None
    )
    if not tool_use:
        raise RuntimeError("Model did not call the expected tool")

    try:
        # Pydantic validates and coerces in one step
        return AnalysisResult(**tool_use.input)
    except ValidationError as e:
        raise RuntimeError(f"Model output failed validation: {e}") from e

Notice the @field_validator on key_topics — this is where Pydantic earns its keep beyond basic type checking. You can encode business rules directly in the schema: no empty strings, arrays must have at least one item, confidence must be between 0 and 1. These rules live with the schema definition, not scattered across your codebase as defensive checks.

Keeping the Schema in One Place

One trap to avoid: duplicating your schema between the tool definition JSON and your Zod/Pydantic model. They drift. Someone updates one and forgets the other, and you get subtle inconsistencies that are annoying to track down.

In TypeScript, you can generate the JSON schema from Zod directly:

typescript

// schema-sync.ts
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const AnalysisResultSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  confidence: z.number().min(0).max(1),
  key_topics: z.array(z.string()).min(1),
  requires_escalation: z.boolean(),
});

// Generate the JSON schema for the tool definition — single source of truth
const jsonSchema = zodToJsonSchema(AnalysisResultSchema, {
  name: "AnalysisResult",
  $refStrategy: "none", // Flatten refs for Anthropic compatibility
});

const outputTool: Anthropic.Tool = {
  name: "submit_analysis",
  description: "Submit the final structured analysis result",
  input_schema: jsonSchema as Anthropic.Tool["input_schema"],
};

Now your Zod schema is the single source of truth. The tool definition derives from it, not the other way around.

In Python, do the same with Pydantic's model_json_schema():

python

# Single source of truth
tool_schema = AnalysisResult.model_json_schema()

OUTPUT_TOOL = {
    "name": "submit_analysis",
    "description": "Submit the final structured analysis result",
    "input_schema": tool_schema,
}

What Breaks Without This

In a real project — a customer feedback processing pipeline — we inherited a codebase that used raw prompt-based JSON requests with no schema enforcement. The symptoms were classic:

A nightly batch job failed silently 3% of the time because confidence came back as a string
An escalation workflow had a bug that only appeared on certain input text lengths, because the model occasionally omitted requires_escalation entirely, and the downstream code treated undefined as falsy
A frontend dashboard showed "Neutral" (capital N) for some records and "neutral" for others, because the rendering code had been patched to handle both after a production incident

All three issues vanished after we introduced schema enforcement at the boundary. Not because the model started returning better output, but because bad output now threw an explicit, catchable error at the point of entry instead of propagating silently through the system.

The Pattern, Summarized

Define your schema once in Zod or Pydantic — this is the source of truth
Generate the tool definition from the schema programmatically
Use tool_choice to force the model to call that tool — no prose, no fallbacks
Parse the tool input through your schema library — throws immediately on invalid output
Pass typed objects to the rest of your application — no defensive checks needed downstream

This adds maybe 30 lines to your integration. It eliminates an entire category of production incidents.

Next in this series: Post 3 — The Context Window Is Your Most Precious Resource — where we cover memory management strategies: sliding windows, summarization loops, and external retrieval, with code for each.

EVERSCAPE LABS builds reliable AI systems for companies that can't afford unreliable ones. If you're evaluating an AI workflow project, get in touch.

Structured Outputs: Why JSON Schema Discipline Changes Everything

Structured Outputs: Why JSON Schema Discipline Changes Everything

Why "Just Ask It to Return JSON" Isn't Enough

Two Levels of Defense

Layer 1: Model-Native Structured Output

Layer 2: Runtime Validation with Zod

The Python Side: Pydantic Does the Same Job

Keeping the Schema in One Place

What Breaks Without This

The Pattern, Summarized

Related Articles

Single Agent vs. Multi-Agent: When the Complexity Is Worth It

Want to Learn More?