Convert any PDF into structured JSON. You define the shape, the API fills it in. Strings stay strings, numbers stay numbers, dates come back ISO-formatted. No regex, no template tuning, no per-document code.
The schema is your contract. The API guarantees the response matches it — every type, every nested object, every array. Coercions are explicit and reported.
curl -X POST \
https://api-parse.conversiontools.io/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@statement.pdf" \
-F 'schema={
"account_holder": "string",
"account_number": "string",
"statement_period": {
"start": "date",
"end": "date"
},
"transactions": [{
"date": "date",
"description": "string",
"amount": "number",
"balance": "number"
}],
"ending_balance": "number"
}'{
"status": "completed",
"pages": 4,
"data": {
"account_holder": "Jane Doe",
"account_number": "****6739",
"statement_period": {
"start": "2026-03-01",
"end": "2026-03-31"
},
"transactions": [
{
"date": "2026-03-04",
"description": "Whole Foods",
"amount": -84.52,
"balance": 4215.48
}
],
"ending_balance": 3127.04
}
}PDF-to-text gives you a string and a regex problem. PDF-to-JSON gives you data.total as a number you can insert directly into a database row.
"1,234.50" comes back as the number 1234.5. "April 26, 2026" comes back as the ISO string "2026-04-26". Coercions are explicit and reported in the response.
Every response includes a validation block: passed / partial / failed, plus warnings for missing required fields. You know the data quality before you persist it.
Most PDFs are visual documents — text positioned on a page, no machine-readable structure. Converting PDF to JSON means reading the document, identifying which text belongs to which field (vendor name, total, line item description), and returning a typed JSON object you can insert into a database or pass to another API.
You define a schema — a list of fields with names and types. The API uses that schema to guide extraction. Same PDF + different schema = different JSON. You can also auto-generate a schema from a sample document in the dashboard, then reuse it via the API.
Yes. Schemas support nested objects (a billing_address with street/city/zip sub-fields) and arrays of objects (line_items as an array where each item is an object with description/quantity/price). The API returns the same nested shape you declared.
PDF to text returns one big string in reading order — you still have to write parsing logic to find specific values. PDF to JSON returns the values directly, typed and named. The schema is the contract; you skip the parser.
Yes. The API runs OCR automatically on scanned PDFs (image-based PDFs). The schema-driven JSON output is the same whether the source is a digitally-generated PDF or a scan.
Sign up free and run your first conversion in under two minutes.