Designing Robust Error Response Contracts
Most API contracts specify success bodies in painstaking detail and leave failures to chance, so every service invents its own error shape and clients accrete brittle parsing for each one. A robust error contract treats failures as first-class, versioned schema: one envelope, machine-readable codes, field-level detail, and an explicit HTTP status taxonomy. This guide extends the broader Schema Design & Validation Patterns and shows how to design that envelope around RFC 7807 problem+json, document it in OpenAPI, and gate it in CI so no endpoint ships an ad-hoc error shape.
The payoff is concrete: a single client error handler that works against every endpoint, support tooling that can correlate a traceId to logs, and SDK generators that emit typed error models instead of any. We standardize on RFC 7807 problem details as the wire format and layer in extension members for codes and validation errors.
When to Use This Approach
Adopt a formal error response contract when any of the following hold:
- You operate more than one service and clients (web, mobile, partner) must parse errors uniformly across all of them.
- Frontend teams are writing per-endpoint error handling because no two endpoints fail the same way.
- You expose a public or partner API where error stability is part of your backward-compatibility promise.
- You generate client SDKs and want typed error models instead of untyped catch blocks.
- Support and on-call need to correlate a user-visible error to server logs via a stable identifier.
- You are introducing field-level validation and need a predictable place to surface per-field messages.
If you have a single internal service with one consumer you control, a lightweight { "code", "message" } shape may be enough — but standardizing early costs little and removes a class of future migrations.
Prerequisites
This guide uses the following tool versions. Pin them in CI to keep examples reproducible.
# Schema validation (JSON Schema draft 2020-12)
npm install -D ajv@8.17 ajv-cli@5.0 ajv-formats@3.0
# OpenAPI linting and validation
npm install -D @stoplight/spectral-cli@6.11 @apidevtools/swagger-cli@4.0
# Runtime validation (server-side error generation)
npm install zod@3.23
# Mock server for contract verification
docker pull stoplight/prism:5
You should already have an OpenAPI 3.1 document (or be ready to start one) and a CI runner such as GitHub Actions. Familiarity with runtime validation using Zod helps, since the server generates errors from validation failures.
The Error Envelope at a Glance
Before the steps, here is the structure we are building. The base RFC 7807 members carry the human- and machine-oriented summary; extension members carry the stable code and the field-level breakdown.
Step 1: Define the Canonical problem+json Schema
Start with a strict, versioned JSON Schema that every failure payload must satisfy. Base it on RFC 7807 (Problem Details for HTTP APIs), whose successor RFC 9457 keeps the identical wire format. The five base members — type, title, status, detail, instance — give you a self-describing error without inventing structure. Treat the schema as a real artifact: version it (error-response-v1.schema.json), check it into the repo, and reference it everywhere.
The decision that matters most here is what is required. Make type, title, and status mandatory; keep detail and instance optional because not every error has a meaningful per-occurrence message. Use additionalProperties: false on the field objects but allow extension members at the envelope top level (RFC 7807 requires that extensions be permitted). We model that by listing the extensions explicitly rather than slamming the whole document shut.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/schemas/error-response-v1.schema.json",
"title": "Problem Details Error Response",
"type": "object",
"required": ["type", "title", "status"],
"properties": {
"type": { "type": "string", "format": "uri" }, // doc URI for this problem class
"title": { "type": "string" }, // stable, human-readable summary
"status": { "type": "integer", "minimum": 400, "maximum": 599 },
"detail": { "type": "string" }, // this-occurrence explanation
"instance": { "type": "string", "format": "uri-reference" },// which request/resource failed
"code": { "type": "string", "pattern": "^[A-Z][A-Z0-9_]+$" }, // machine-readable, stable
"traceId": { "type": "string" }, // correlates to server logs
"errors": { // field-level breakdown (Step 2)
"type": "array",
"items": { "$ref": "#/$defs/fieldError" }
}
},
"$defs": {
"fieldError": {
"type": "object",
"required": ["field", "code"],
"additionalProperties": false,
"properties": {
"field": { "type": "string" }, // dot/bracket path: address.zip, items[0].sku
"code": { "type": "string", "pattern": "^[A-Z][A-Z0-9_]+$" },
"message": { "type": "string" } // optional human hint; clients prefer code
}
}
}
}
Two rules to enforce from day one. First, never auto-coerce status: the string "404" and the integer 404 are different contracts, and loose parsers hide drift. Second, keep title stable per problem class — it is effectively documentation, while code is the identifier clients branch on.
Step 2: Add Machine-Readable Codes and Field-Level Errors
A human-readable detail is for logs and toasts; clients need a value they can switch on without string matching. That is the code extension member: a stable, uppercase identifier like USER_EMAIL_TAKEN or VALIDATION_FAILED. The cardinal rule is that a code’s meaning never changes once shipped — renaming RATE_LIMITED to TOO_MANY_REQUESTS is a breaking change even though the status code is unchanged.
For validation failures, a single top-level message is not enough; the client needs to know which field failed and why so it can highlight the right input. Carry that in the errors array. Generating this array from a validator keeps it honest — here we map a Zod failure into the envelope. This mirrors the patterns in runtime validation with Zod, reusing the validation you already run.
// errorEnvelope.ts — build a problem+json body from a ZodError (Zod 3.23)
import { ZodError } from "zod";
interface FieldError { field: string; code: string; message?: string; }
interface ProblemDetails {
type: string; title: string; status: number;
detail?: string; instance?: string; code?: string;
traceId?: string; errors?: FieldError[];
}
const ZOD_TO_CODE: Record<string, string> = {
invalid_type: "TYPE", // wrong JSON type
too_small: "MIN", // below min length/value
too_big: "MAX", // above max length/value
invalid_string: "FORMAT", // email/uuid/regex mismatch
};
export function problemFromZod(err: ZodError, instance: string, traceId: string): ProblemDetails {
return {
type: "https://example.com/problems/validation-failed",
title: "Request validation failed",
status: 422, // see Step 3 for 400 vs 422
detail: "One or more fields are invalid.",
instance, // e.g. "/users" or the request id URI
code: "VALIDATION_FAILED", // stable, what clients branch on
traceId, // map to logs, do NOT leak internals
errors: err.issues.map((i) => ({
field: i.path.join("."), // "address.zip", "items.0.sku"
code: ZOD_TO_CODE[i.code] ?? "INVALID", // per-field machine code
message: i.message, // optional human hint
})),
};
}
The resulting payload is self-explanatory and stable:
{
"type": "https://example.com/problems/validation-failed",
"title": "Request validation failed",
"status": 422,
"detail": "One or more fields are invalid.",
"instance": "/users",
"code": "VALIDATION_FAILED",
"traceId": "01H2XK9P3Q",
"errors": [
{ "field": "email", "code": "FORMAT", "message": "Must be a valid email" },
{ "field": "age", "code": "MIN", "message": "Must be at least 18" }
]
}
Step 3: Define the HTTP Status Taxonomy
Codes describe what failed; the HTTP status describes how the client should react. Picking statuses ad hoc is the most common source of inconsistent error contracts, so write the taxonomy down and apply it everywhere. The boundary questions that recur are 400 vs 422, 401 vs 403, and 404 vs 409. The deeper rationale for each status belongs with standardizing HTTP error codes in OpenAPI definitions; the working rules:
- 400 Bad Request — the body is malformed or unparseable (broken JSON, wrong content type). The request cannot even be understood.
- 422 Unprocessable Content — the body parses fine but fails schema or business validation. This is where
errors[]belongs. - 401 Unauthorized — no valid credentials. The client should authenticate.
- 403 Forbidden — valid credentials, but not allowed. Authenticating again will not help.
- 404 Not Found — the target resource does not exist (and, often, you do not want to reveal that it does).
- 409 Conflict — the request conflicts with current state (duplicate key, version mismatch, already-processed).
- 429 Too Many Requests — rate limited; pair with a
Retry-Afterheader. - 500 / 503 — server fault or temporary unavailability. Never put validation detail here.
Pick one convention for validation (we use 422) and never mix it with 400 across services. The decision flow:
Step 4: Document Errors in OpenAPI
With the envelope and taxonomy fixed, encode them once in your OpenAPI document and reference them everywhere. Define the Problem schema and a set of reusable responses under components, then attach those responses to operations by $ref. This eliminates copy-paste, makes generated SDKs emit typed error models, and lets a linter enforce coverage. This is the foundation that standardizing HTTP error codes in OpenAPI definitions builds on.
# openapi.yaml (OpenAPI 3.1)
paths:
/users:
post:
operationId: createUser
responses:
'201': { description: Created }
'422': { $ref: '#/components/responses/ValidationFailed' }
'409': { $ref: '#/components/responses/Conflict' }
'500': { $ref: '#/components/responses/ServerError' }
components:
responses:
ValidationFailed:
description: Request body failed validation
content:
application/problem+json: # signals a problem document, not a success body
schema: { $ref: '#/components/schemas/Problem' }
example:
type: https://example.com/problems/validation-failed
title: Request validation failed
status: 422
code: VALIDATION_FAILED
errors:
- { field: email, code: FORMAT }
Conflict:
description: Resource conflicts with current state
content:
application/problem+json:
schema: { $ref: '#/components/schemas/Problem' }
ServerError:
description: Unexpected server error
content:
application/problem+json:
schema: { $ref: '#/components/schemas/Problem' }
schemas:
Problem:
type: object
required: [type, title, status] # mirrors the JSON Schema in Step 1
properties:
type: { type: string, format: uri }
title: { type: string }
status: { type: integer, minimum: 400, maximum: 599 }
detail: { type: string }
instance: { type: string, format: uri-reference }
code: { type: string, pattern: '^[A-Z][A-Z0-9_]+$' }
traceId: { type: string }
errors:
type: array
items: { $ref: '#/components/schemas/FieldError' }
FieldError:
type: object
required: [field, code]
additionalProperties: false
properties:
field: { type: string }
code: { type: string, pattern: '^[A-Z][A-Z0-9_]+$' }
message: { type: string }
A custom Spectral rule turns “every operation documents its errors” into an enforceable invariant rather than a code-review hope:
# .spectral.yaml — require at least one 4xx and problem+json on errors
rules:
operation-has-4xx-response:
description: Every operation must document at least one 4xx error.
given: $.paths[*][get,post,put,patch,delete].responses
then:
function: schema
functionOptions:
schema:
type: object
patternProperties:
"^4[0-9]{2}$": {}
minProperties: 1
severity: error
Step 5: Gate the Contract in CI
A contract that is not enforced decays. Validate two things in CI: that your example/fixture payloads satisfy the JSON Schema, and that the OpenAPI document lints clean against the error rules. Fail the build on any violation so non-compliant error shapes never merge.
# .github/workflows/error-contract-gate.yml
name: Error Contract Gate
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- name: Validate error fixtures against schema
run: |
npx ajv validate \
-s schemas/error-response-v1.schema.json \
-d 'tests/fixtures/errors/*.json' \
-c ajv-formats --spec=draft2020 --strict=true
- name: Lint OpenAPI error coverage
run: npx spectral lint openapi.yaml --fail-severity=error
- name: Validate spec structure
run: npx swagger-cli validate openapi.yaml
Spec/Schema Reference
The error envelope fields, their types, and the contract each one carries:
| Field | Type | Required | Default | Effect |
|---|---|---|---|---|
type |
string (URI) | yes | about:blank |
Identifies the problem class; dereferences to human docs. Stable but may change as docs move. |
title |
string | yes | — | Short, human-readable summary of the problem class. Should not vary per occurrence. |
status |
integer (400–599) | yes | — | HTTP status code, duplicated in the body for clients that lose the response line. |
detail |
string | no | omitted | Human-readable explanation specific to this occurrence. Safe for display; no internals. |
instance |
string (URI ref) | no | omitted | Identifies the specific request or resource that failed. |
code |
string ^[A-Z][A-Z0-9_]+$ |
no | omitted | Stable machine-readable identifier clients branch on. Meaning must never change. |
traceId |
string | no | omitted | Correlation id mapping the error to server logs for support and on-call. |
errors |
array of FieldError |
no | omitted | Per-field validation failures; present on 422 validation responses. |
errors[].field |
string | yes (in item) | — | Dot/bracket path to the offending input (address.zip, items[0].sku). |
errors[].code |
string ^[A-Z][A-Z0-9_]+$ |
yes (in item) | — | Per-field machine code (FORMAT, MIN, REQUIRED). |
errors[].message |
string | no | omitted | Optional human hint; clients should prefer the code. |
Verification
Confirm the contract end-to-end. First, the schema gate over fixtures should report clean:
$ npx ajv validate -s schemas/error-response-v1.schema.json \
-d 'tests/fixtures/errors/*.json' -c ajv-formats --spec=draft2020
tests/fixtures/errors/validation-422.json valid
tests/fixtures/errors/conflict-409.json valid
tests/fixtures/errors/server-500.json valid
Second, run a contract-aware mock from the spec and verify a forced error matches the envelope. Prism serves schema-compliant problem+json:
$ docker run --rm -p 4010:4010 -v "$PWD/openapi.yaml:/api.yaml" \
stoplight/prism:5 mock -h 0.0.0.0 /api.yaml
$ curl -s -H 'Prefer: code=422' http://localhost:4010/users -d '{}'
{ "type": "...", "title": "Request validation failed", "status": 422,
"code": "VALIDATION_FAILED", "errors": [ { "field": "email", "code": "FORMAT" } ] }
A green CI run plus a mock that returns the exact envelope you documented means the contract is real, not aspirational.
Troubleshooting
additional properties not allowed on fixtures. AJV reports extra members because the envelope schema is closed against an unlisted field (often a legacy error_msg or trace_id). Root cause: the schema and the actual payload have drifted. Fix: add the field to the schema’s properties (envelope extensions are allowed) or migrate the payload to detail/traceId. Do not blanket-add additionalProperties: true — that defeats the gate.
status validates as a string. A fixture has "status": "422" and a permissive parser let it through, but the canonical schema requires an integer. Root cause: type coercion somewhere in the producer. Fix: emit status as a number at the source and keep --strict=true so AJV refuses to coerce. String statuses break clients that compare numerically.
Generated SDK types the error as any or object. The generator could not resolve the error schema. Root cause: the response uses an inline schema or application/json instead of $ref to #/components/schemas/Problem with application/problem+json. Fix: route every error through the reusable components/responses entries shown in Step 4 and re-generate.
Mock server returns 200 instead of the error. Prism prefers success examples unless steered. Root cause: no error scenario was selected. Fix: send Prefer: code=422 (Prism 5) or post an invalid body so validation triggers the documented error path.
Spectral passes but an endpoint still ships no 4xx. The operation uses an HTTP method not covered by the rule’s given JSONPath. Root cause: the path expression omits a verb (e.g. head, options). Fix: broaden the given to include every method you expose, then re-lint.
Frequently Asked Questions
What is the difference between RFC 7807 and RFC 9457?
RFC 9457 (2023) obsoletes RFC 7807 but keeps the same wire format: type, title, status, detail, instance, plus extension members. Existing problem+json payloads remain valid; 9457 mainly clarifies extension registration and adds guidance. Quoting either RFC is fine, but new specs should cite 9457.
Should the machine-readable error code live in type or in a separate field?
Use a dedicated extension member such as code or error_code for the stable identifier clients branch on, and keep type as a documentation URI. type values can change as docs move; a code like USER_EMAIL_TAKEN is a contract clients depend on and should never change meaning.
Which HTTP status code should I use for validation failures?
Use 422 Unprocessable Content when the request is syntactically valid JSON but fails business or schema validation, and 400 Bad Request when the body is malformed or unparseable. Both are acceptable for validation; pick one convention and apply it consistently across every endpoint.
Can I add custom fields to a problem+json response?
Yes. RFC 7807/9457 explicitly allow extension members at the top level, such as errors, code, traceId, or balance. Clients must ignore members they do not recognize, so additive changes are non-breaking as long as you never repurpose an existing field name.
Should error responses use application/json or application/problem+json?
Use application/problem+json for the Content-Type so clients and gateways can distinguish errors from success bodies and apply problem-specific parsing. The body shape is identical; only the media type signals it is a problem document.
How do I avoid leaking internal details in error messages?
Keep detail human-readable but free of stack traces, SQL, and internal hostnames. Put diagnostic identifiers in a traceId or instance field that maps to server logs, so support can correlate without exposing internals to the caller.