Pagination and Filtering Schema Patterns

Q: Why does my offset pagination skip or duplicate rows?

Offset pagination resolves LIMIT/OFFSET against the live table at query time, so when rows are inserted or deleted between page requests the window shifts. The fix is a stable, total ordering — order by a unique tiebreaker like the primary key alongside your sort column — or switch to cursor pagination, which anchors to a row value instead of a position.

Q: How do I encode a pagination cursor?

Encode the cursor as an opaque base64url string (RFC 4648 section 5, no padding) that wraps the last row's ordering key, for example its id and sort value. Treat it as opaque to clients: never document its internal shape, and version it so you can change the encoding without breaking existing tokens.

Q: Do I need a total count in every paginated response?

No. total is expensive on large tables because it forces a full count query, and it is meaningless for cursor pagination over a mutable set. Return has_more or a next cursor as the authoritative signal of more data, and expose total only as an optional, explicitly opted-in field.

List endpoints fail in slow, expensive, and subtly wrong ways: unbounded result sets, duplicated rows across pages, and free-form filter strings that turn into accidental query languages. This guide is a runnable blueprint for pagination and filtering schemas — cursor versus offset strategies, response envelopes and links, filter and sort query-parameter design, stable ordering, and how to document all of it in OpenAPI. It extends the broader Schema Design & Validation Patterns discipline, and pairs naturally with Designing Robust Error Response Contracts for the 400 responses your validation layer will return.

When to Use This Approach

Reach for these patterns when any of the following is true:

A collection endpoint can return more than a few hundred rows, so the response must be paged rather than returned whole.
Clients drive infinite scroll, real-time feeds, or sync loops where rows are inserted and deleted between page requests.
Multiple teams or external consumers depend on the list contract, so query parameters and response shape must be specified, not improvised.
You need to gate the contract in CI — rejecting unbounded limit, unvalidated filters, or unstable ordering before they ship.
You are designing filtering and sorting that must stay backward compatible as new filterable fields are added.

If your dataset is small, immutable, and consumed by a single front end, a plain bounded limit/offset pair may be all you need — but specify it explicitly anyway, because an undocumented list endpoint is a future breaking change waiting to happen.

Prerequisites

You will need the OpenAPI tooling that the rest of this site standardizes on:

# Spec linting and runtime mocking
npm install -D @stoplight/spectral-cli@6.11   # OpenAPI/AsyncAPI linter
npm install -D @stoplight/prism-cli@5          # mock server from the spec
npm install zod@3.23                            # runtime request validation

This guide targets OpenAPI 3.1 (which aligns with JSON Schema 2020-12, so minimum/maximum/pattern behave as expected). Examples assume a relational backing store, but the cursor pattern applies equally to any ordered data source. Familiarity with typed response objects helps — paginated envelopes are themselves nested objects, covered in Handling Complex Nested Objects in API Schemas.

The Two Pagination Models at a Glance

Offset pagination addresses rows by position (LIMIT 20 OFFSET 40). Cursor pagination addresses rows by value — “give me the 20 rows after the one whose key is X.” That single difference drives every trade-off below: offset is random-access but fragile under writes and slow at depth; cursor is sequential-only but stable and constant-time. The diagram makes the failure mode concrete.

The deeper performance and correctness analysis lives in Cursor vs Offset Pagination Schema Design; the steps below give you a contract that supports either.

Step 1: Choose a Pagination Strategy

Decide before you write a line of schema, because the choice shapes both the request and response contracts.

Cursor (keyset) pagination — default for large or mutable collections, feeds, and infinite scroll. The client passes back an opaque token; the server resolves it to WHERE (sort_key, id) > (?, ?). Stable under concurrent writes, constant-time at any depth, but no arbitrary page jumps.
Offset pagination — acceptable for small, stable datasets and admin tables where a user clicks “page 7.” Trivial to implement and supports total page counts, but degrades linearly with depth and skips or duplicates rows under writes (see the diagram above).

A pragmatic rule: if the collection can grow without bound or mutates while users browse it, choose cursor. The comparison table below summarizes the decision.

Step 2: Define the Query Parameter Contract

Specify every input parameter with explicit bounds. The single most important rule is that limit always has a maximum — an unbounded limit is a denial-of-service vector. Filters are discrete, named, enum- or pattern-bounded parameters rather than a free-form query string; the full grammar is covered in Designing Filter and Sort Query Parameter Schemas.

# openapi.yaml — OpenAPI 3.1
paths:
  /articles:
    get:
      summary: List articles (cursor-paginated)
      parameters:
        - name: limit                       # page size, ALWAYS bounded
          in: query
          required: false
          schema:
            type: integer
            minimum: 1
            maximum: 100                     # hard ceiling — prevents huge scans
            default: 20
        - name: cursor                       # opaque forward token
          in: query
          required: false
          schema:
            type: string
            pattern: '^[A-Za-z0-9_-]+$'      # base64url, no padding
          description: >
            Opaque pagination token from meta.next_cursor. Treat as opaque;
            do not construct or parse on the client. Mutually exclusive with offset.
        - name: status                       # filter: bounded enum, not free text
          in: query
          required: false
          schema:
            type: string
            enum: [draft, published, archived]
        - name: tag                          # repeatable filter, capped
          in: query
          required: false
          schema:
            type: array
            maxItems: 10                      # cap to bound the IN clause
            items:
              type: string
              maxLength: 40
          style: form
          explode: true                       # ?tag=a&tag=b
        - name: sort                          # single grammar-driven sort param
          in: query
          required: false
          schema:
            type: string
            default: '-createdAt'
            # leading '-' = descending; comma-separated for tiebreakers
            pattern: '^-?(createdAt|updatedAt|title)(,-?(createdAt|updatedAt|title))*$'
      responses:
        '200':
          description: Paginated article list
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ArticleListResponse'
        '400':
          description: Invalid pagination, filter, or sort parameter
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Problem'   # see error-contracts guide

The sort pattern is doing real work: it whitelists exactly which fields are sortable. An unrecognized field can never reach SQL, which closes off both a 400-storm and an injection surface. The Problem schema referenced for 400 is the RFC 7807 shape defined in Designing Robust Error Response Contracts.

Step 3: Design the Response Envelope

Never return a bare top-level array. Wrap results in a typed envelope so you have somewhere to attach pagination metadata and links, and so the contract is extensible without a breaking change. A bare array is also a security footgun in older JSON serializers and leaves no room for meta.

# openapi.yaml — components
components:
  schemas:
    ArticleListResponse:
      type: object
      required: [data, meta]
      additionalProperties: false
      properties:
        data:
          type: array
          items:
            $ref: '#/components/schemas/Article'
        meta:
          $ref: '#/components/schemas/PageMeta'
        links:
          $ref: '#/components/schemas/PageLinks'
    PageMeta:
      type: object
      required: [has_more]
      additionalProperties: false
      properties:
        has_more:
          type: boolean                  # authoritative "more data" signal
        next_cursor:
          type: string
          nullable: true                 # null on the last page
        total:
          type: integer                  # OPTIONAL — omit for big mutable sets
          minimum: 0
    PageLinks:                            # HATEOAS-style navigation
      type: object
      additionalProperties: false
      properties:
        self:  { type: string, format: uri-reference }
        next:  { type: string, format: uri-reference, nullable: true }
        prev:  { type: string, format: uri-reference, nullable: true }

A concrete, valid response — note that next_cursor and links.next are null together when the page is the last one:

{
  "data": [
    { "id": "art_104", "title": "Keyset pagination in practice", "status": "published" },
    { "id": "art_103", "title": "Designing sort grammars", "status": "published" }
  ],
  "meta": {
    "has_more": true,
    "next_cursor": "eyJpZCI6ImFydF8xMDMiLCJjIjoiMjAyNi0wNi0xMSJ9"
  },
  "links": {
    "self": "/articles?sort=-createdAt&limit=2",
    "next": "/articles?sort=-createdAt&limit=2&cursor=eyJpZCI6ImFydF8xMDMiLCJjIjoiMjAyNi0wNi0xMSJ9",
    "prev": null
  }
}

The next_cursor here base64url-decodes to {"id":"art_103","c":"2026-06-11"} — the last row’s ordering key. Clients never see or build that shape; they only echo the opaque token back. Keep an explicit version byte or field inside the cursor so you can change the encoding later without invalidating tokens in the wild.

Step 4: Guarantee Stable Ordering

Pagination correctness rests entirely on a total ordering. If your sort column has duplicate values — many rows share a createdAt timestamp — the database may return them in different relative orders across requests, so page boundaries drift and rows are skipped or repeated. Always append a unique tiebreaker (the primary key) to every sort, and build the cursor from the same compound key.

-- Cursor page: order by the sort column AND a unique tiebreaker.
-- The cursor encodes the last row's (created_at, id) pair.
SELECT id, title, status, created_at
FROM articles
WHERE status = $1                                  -- filter
  AND (created_at, id) < ($2, $3)                  -- keyset from cursor (DESC)
ORDER BY created_at DESC, id DESC                  -- total ordering, no ties
LIMIT $4;                                          -- limit + 1 to detect has_more

Two practitioner details: fetch limit + 1 rows and trim the extra to compute has_more without a second query, and ensure a composite index on (status, created_at DESC, id DESC) exists so the keyset predicate is index-only. For offset pagination the same tiebreaker rule applies — ORDER BY created_at DESC, id DESC — otherwise even LIMIT/OFFSET can shuffle rows that share a timestamp.

Step 5: Validate and Gate the Contract

Two layers defend the contract. Spectral lints the spec at merge time so structural rules (bounded limit, enum’d filters) can never regress, and a runtime validator built with Runtime Validation with Zod rejects malformed requests before they reach the database.

# .spectral.yaml — enforce pagination/filter hygiene in CI
extends: ["spectral:oas"]
rules:
  pagination-limit-bounded:
    description: Every list 'limit' must declare minimum and maximum.
    severity: error
    given: "$.paths[*].get.parameters[?(@.name=='limit')].schema"
    then:
      - field: maximum
        function: defined
      - field: minimum
        function: defined
  filter-must-constrain:
    description: Filter params must use enum, pattern, or maxItems.
    severity: warn
    given: "$.paths[*].get.parameters[?(@.in=='query')].schema"
    then:
      function: schema
      functionOptions:
        schema:
          anyOf:
            - required: [enum]
            - required: [pattern]
            - required: [maxItems]

Runtime validation coerces and bounds the query string, returning a clean 400 for anything out of contract:

// query.ts — Zod 3.23 request guard for the list endpoint
import { z } from "zod";

export const listQuery = z.object({
  limit: z.coerce.number().int().min(1).max(100).default(20),
  cursor: z.string().regex(/^[A-Za-z0-9_-]+$/).optional(),
  status: z.enum(["draft", "published", "archived"]).optional(),
  tag: z.array(z.string().max(40)).max(10).optional(),
  sort: z.string()
    .regex(/^-?(createdAt|updatedAt|title)(,-?(createdAt|updatedAt|title))*$/)
    .default("-createdAt"),
}).strict(); // reject unknown query params instead of ignoring them

.strict() is the runtime mirror of additionalProperties: false — an unknown query parameter is a contract violation, not something to silently drop.

Pagination Strategy Comparison

Dimension	Offset pagination	Cursor (keyset) pagination
Request param	`offset` + `limit`	opaque `cursor` + `limit`
Addresses rows by	position (`OFFSET n`)	value (`WHERE key > ?`)
Performance at depth	degrades linearly (deep scans)	constant-time per page
Stability under writes	skips/duplicates rows	stable boundaries
Arbitrary page jumps	yes (page 7 directly)	no (forward/back only)
Total page count	natural to provide	expensive / often omitted
Index requirement	sort + tiebreaker	composite keyset index
Best for	small, stable, admin tables	large, mutable, feeds, sync

Spec/Schema Reference

Parameter / field	Type	Default	Effect
`limit`	integer (1–100)	20	Page size; `maximum` caps scan cost
`cursor`	string (base64url)	—	Opaque forward token; mutually exclusive with `offset`
`offset`	integer (≥0)	0	Row skip count (offset mode only); cap to avoid deep scans
`status`	enum	—	Bounded equality filter
`tag`	string[] (≤10)	—	Repeatable filter; `maxItems` bounds the `IN` clause
`sort`	pattern-bounded string	`-createdAt`	Field grammar; leading `-` = descending
`meta.has_more`	boolean	—	Authoritative “more data” signal; always present
`meta.next_cursor`	string \| null	—	Token for the next page; `null` on last page
`meta.total`	integer	omitted	Optional total count; skip for large mutable sets

Verification

Confirm the contract holds before merging. First lint the spec:

$ npx spectral lint openapi.yaml --ruleset .spectral.yaml --fail-severity warn
OpenAPI 3.x detected
No results with a severity of 'error' or above found!

Then prove the runtime guard rejects out-of-contract input. Spin up the mock and probe it:

$ npx prism mock openapi.yaml --port 4010 &
$ curl -s -o /dev/null -w "%{http_code}\n" "http://localhost:4010/articles?limit=500"
400                       # over the maximum — rejected
$ curl -s "http://localhost:4010/articles?limit=2" | jq '.meta.has_more, .links.next != null'
true
true                      # envelope carries paging metadata

A passing CI run shows both the Spectral step green and the contract test asserting has_more is present and limit > 100 returns 400.

Troubleshooting

Rows skipped or duplicated across pages

Signature: users report missing or repeated items while scrolling; counts drift between page loads. Root cause: non-total ordering — the sort column has ties and no unique tiebreaker, or offset pagination over a table receiving concurrent inserts. Fix: add the primary key as a final ORDER BY term (ORDER BY created_at DESC, id DESC) and build the cursor from that same compound key. For mutable collections, switch to cursor pagination so boundaries anchor to a value, not a position.

`Cursor decode failed` / 400 on a valid-looking token

Signature: a token copied from meta.next_cursor is rejected. Root cause: standard base64 versus base64url mismatch, or padding (=) added by an intermediary. Fix: standardize on base64url with no padding (RFC 4648 §5 — - and _, never +//), strip padding on encode, and validate with pattern: '^[A-Za-z0-9_-]+$'. Version the cursor payload so old tokens fail loudly instead of decoding to garbage.

Deep pages time out

Signature: ?offset=200000 takes seconds or times out while early pages are instant. Root cause: OFFSET n still scans and discards the first n rows. Fix: cap offset (reject offset > 10000 with a 400) and migrate hot, deep-scrolling endpoints to cursor pagination, which is constant-time regardless of depth.

Filter explosion crashes the query planner

Signature: ?tag=a&tag=b&... with dozens of values produces a giant IN clause and slow queries. Root cause: array filter has no maxItems. Fix: set maxItems on every array parameter (10 is a sane default) and enforce it in both the OpenAPI schema and the Zod guard so the cap is contractual, not advisory.

Frontend infinite scroll never stops or stops early

Signature: the client loops forever or halts before the data is exhausted. Root cause: has_more omitted, or the client infers “done” from an empty data array that can legitimately occur mid-stream after filtering. Fix: always return has_more (boolean, never absent) and pair next_cursor: null with has_more: false. The client must trust has_more/next_cursor, never array length.

Frequently Asked Questions

Should I use cursor or offset pagination?

Use cursor pagination for large, mutable datasets, infinite scroll, and real-time feeds where rows shift between requests. Use offset pagination only for small, stable datasets where users need to jump to an arbitrary page number. Cursor pagination keeps page boundaries stable and avoids deep-scan performance cliffs.

Why does my offset pagination skip or duplicate rows?

Offset pagination resolves LIMIT/OFFSET against the live table at query time, so when rows are inserted or deleted between page requests the window shifts. The fix is a stable, total ordering — order by a unique tiebreaker like the primary key alongside your sort column — or switch to cursor pagination, which anchors to a row value instead of a position.

How do I encode a pagination cursor?

Encode the cursor as an opaque base64url string (RFC 4648 §5, no padding) that wraps the last row’s ordering key, for example its id and sort value. Treat it as opaque to clients: never document its internal shape, and version it so you can change the encoding without breaking existing tokens.

How should filter and sort parameters be structured in a query string?

Expose each filter as a discrete, named, enum- or pattern-bounded query parameter rather than a free-form query language. Use a single sort parameter with a documented grammar such as sort=-createdAt,name where a leading minus means descending. Always cap array filters with maxItems and reject unknown sort fields with a 400.

Do I need a total count in every paginated response?

No. total is expensive on large tables because it forces a full count query, and it is meaningless for cursor pagination over a mutable set. Return has_more or a next cursor as the authoritative signal of more data, and expose total only as an optional, explicitly opted-in field.