Extract structured data from PDFs, invoices, receipts, and forms with a single API call. Send any document, get clean JSON back. No templates, no training data, no configuration files.
Send a document to the extraction endpoint and receive structured JSON. Define a schema to control exactly which fields you need, or let the API auto-detect the document structure.
curl -X POST \
https://api-parse.conversiontools.io/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf" \
-F 'schema={
"vendor": "string",
"date": "string",
"total": "number",
"line_items": [{
"description": "string",
"amount": "number"
}]
}'{
"status": "completed",
"data": {
"vendor": "Acme Corp",
"date": "2026-03-01",
"total": 1250.00,
"line_items": [
{
"description": "Consulting services",
"amount": 1000.00
},
{
"description": "Expenses",
"amount": 250.00
}
]
}
}Three steps to go from raw document to structured data. No training, no templates — the API understands your documents automatically.
Send any PDF, image, or scanned document to the API endpoint. Supports invoices, receipts, forms, contracts, and more.
The API reads the document, understands its structure, and extracts the fields you requested in your schema definition.
Receive clean, typed JSON matching your schema. Ready to store in a database, feed into a pipeline, or display in your application.
Built for developers who need reliable, accurate data extraction without the complexity of traditional OCR pipelines.
Uses large language models to understand document context and layout. Handles varied formats without custom rules or template configuration.
One endpoint, one API call. Send a file, get JSON back. No SDKs required — works with curl, Python, Node.js, or any HTTP client.
Define exactly which fields to extract with JSON schemas. Support for strings, numbers, dates, arrays, and nested objects. Reuse schemas across documents.
Process PDFs, scanned images, JPEGs, PNGs, WebP, and TIFF files. Works with both digital and scanned documents across all languages.
Files are processed and deleted automatically. No document data is stored after extraction. EU-hosted infrastructure with encrypted connections.
Most single-page documents are processed in seconds. Synchronous and asynchronous modes available depending on your use case and document size.
The same API works across all your document processing needs. Define a schema once, extract data from thousands of documents.
Extract vendor, amounts, line items, dates, and tax details from invoices automatically.
Parse store names, totals, items, and payment methods from receipts and POS printouts.
Extract fields from applications, surveys, tax forms, and government documents.
Convert any PDF into structured data. Works with digital and scanned PDFs across all languages.
Common questions about the data extraction API.
Parse accepts PDF, JPEG, PNG, WebP, and TIFF files. You can extract structured data from scanned documents, digital PDFs, photos of receipts, and any image-based document.
No. Parse uses AI to understand document structure automatically. You define a JSON schema describing the fields you want, and the API extracts them from any document — no templates, training data, or configuration files required.
Parse uses large language models to understand document context, not just OCR text matching. This means it handles varied layouts, languages, and formatting. Accuracy depends on document quality, but most structured documents like invoices and receipts achieve high extraction rates.
Yes. The free plan includes 100 pages per month with full API access, custom schemas, and all supported file formats. No credit card required to start.
For batch processing, use the asynchronous extraction endpoint. Submit documents and poll for results, or use webhooks to get notified when extraction is complete. The Pro plan supports 2,500 pages per month with priority processing.
Get your API key and extract data from your first document in minutes. 100 pages per month free — no credit card required.