REST API · Schema-typed output · 100 free pages/mo

PDF to JSON API

Convert any PDF into structured JSON. You define the shape, the API fills it in. Strings stay strings, numbers stay numbers, dates come back ISO-formatted. No regex, no template tuning, no per-document code.

Get an API Key Free Read the Docs

Schema In, JSON Out

The schema is your contract. The API guarantees the response matches it — every type, every nested object, every array. Coercions are explicit and reported.

request.sh

curl -X POST \
  https://api-parse.conversiontools.io/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@statement.pdf" \
  -F 'schema={
    "account_holder": "string",
    "account_number": "string",
    "statement_period": {
      "start": "date",
      "end": "date"
    },
    "transactions": [{
      "date": "date",
      "description": "string",
      "amount": "number",
      "balance": "number"
    }],
    "ending_balance": "number"
  }'

response.json

{
  "status": "completed",
  "pages": 4,
  "data": {
    "account_holder": "Jane Doe",
    "account_number": "****6739",
    "statement_period": {
      "start": "2026-03-01",
      "end": "2026-03-31"
    },
    "transactions": [
      {
        "date": "2026-03-04",
        "description": "Whole Foods",
        "amount": -84.52,
        "balance": 4215.48
      }
    ],
    "ending_balance": 3127.04
  }
}

Why JSON-First Beats Text Extraction

Skip the parser

PDF-to-text gives you a string and a regex problem. PDF-to-JSON gives you data.total as a number you can insert directly into a database row.

Typed coercion

"1,234.50" comes back as the number 1234.5. "April 26, 2026" comes back as the ISO string "2026-04-26". Coercions are explicit and reported in the response.

Validation built-in

Every response includes a validation block: passed / partial / failed, plus warnings for missing required fields. You know the data quality before you persist it.

Frequently Asked Questions

What does it mean to convert PDF to JSON?

Most PDFs are visual documents — text positioned on a page, no machine-readable structure. Converting PDF to JSON means reading the document, identifying which text belongs to which field (vendor name, total, line item description), and returning a typed JSON object you can insert into a database or pass to another API.

How does the API decide what fields to extract?

You define a schema — a list of fields with names and types. The API uses that schema to guide extraction. Same PDF + different schema = different JSON. You can also auto-generate a schema from a sample document in the dashboard, then reuse it via the API.

Can I extract nested objects and arrays from PDFs?

Yes. Schemas support nested objects (a billing_address with street/city/zip sub-fields) and arrays of objects (line_items as an array where each item is an object with description/quantity/price). The API returns the same nested shape you declared.

How does PDF to JSON differ from PDF to text?

PDF to text returns one big string in reading order — you still have to write parsing logic to find specific values. PDF to JSON returns the values directly, typed and named. The schema is the contract; you skip the parser.

Does it work for scanned PDFs?

Yes. The API runs OCR automatically on scanned PDFs (image-based PDFs). The schema-driven JSON output is the same whether the source is a digitally-generated PDF or a scan.

Try It on Your PDF

Get Started Free See full PDF parsing guide