REST API · JSON output · 100 free pages/mo

Data Extraction API

Extract structured data from PDFs, invoices, receipts, and forms with a single API call. Send any document, get clean JSON back. No templates, no training data, no configuration files.

Start Extracting Free Read the Docs

One API Call to Extract Data from Any Document

Send a document to the extraction endpoint and receive structured JSON. Define a schema to control exactly which fields you need, or let the API auto-detect the document structure.

request.sh

curl -X POST \
  https://api-parse.conversiontools.io/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F 'schema={
    "vendor": "string",
    "date": "string",
    "total": "number",
    "line_items": [{
      "description": "string",
      "amount": "number"
    }]
  }'

response.json

{
  "status": "completed",
  "data": {
    "vendor": "Acme Corp",
    "date": "2026-03-01",
    "total": 1250.00,
    "line_items": [
      {
        "description": "Consulting services",
        "amount": 1000.00
      },
      {
        "description": "Expenses",
        "amount": 250.00
      }
    ]
  }
}

How the Data Extraction API Works

Three steps to go from raw document to structured data. No training, no templates — the API understands your documents automatically.

Upload Your Document

Send any PDF, image, or scanned document to the API endpoint. Supports invoices, receipts, forms, contracts, and more.

AI Extracts the Data

The API reads the document, understands its structure, and extracts the fields you requested in your schema definition.

Get Structured JSON

Receive clean, typed JSON matching your schema. Ready to store in a database, feed into a pipeline, or display in your application.

Why Developers Choose Parse for Data Extraction

Built for developers who need reliable, accurate data extraction without the complexity of traditional OCR pipelines.

AI-Powered Extraction

Uses large language models to understand document context and layout. Handles varied formats without custom rules or template configuration.

Simple REST API

One endpoint, one API call. Send a file, get JSON back. No SDKs required — works with curl, Python, Node.js, or any HTTP client.

Custom Schemas

Define exactly which fields to extract with JSON schemas. Support for strings, numbers, dates, arrays, and nested objects. Reuse schemas across documents.

Multiple Document Formats

Process PDFs, scanned images, JPEGs, PNGs, WebP, and TIFF files. Works with both digital and scanned documents across all languages.

Privacy & Security

Files are processed and deleted automatically. No document data is stored after extraction. EU-hosted infrastructure with encrypted connections.

Fast Response Times

Most single-page documents are processed in seconds. Synchronous and asynchronous modes available depending on your use case and document size.

Data Extraction for Every Document Type

The same API works across all your document processing needs. Define a schema once, extract data from thousands of documents.

Invoice Extraction

Extract vendor, amounts, line items, dates, and tax details from invoices automatically.

Receipt Parsing

Parse store names, totals, items, and payment methods from receipts and POS printouts.

Form Processing

Extract fields from applications, surveys, tax forms, and government documents.

PDF Parsing API

Convert any PDF into structured data. Works with digital and scanned PDFs across all languages.

Frequently Asked Questions

Common questions about the data extraction API.

What file formats does the data extraction API support?

Parse accepts PDF, JPEG, PNG, WebP, and TIFF files. You can extract structured data from scanned documents, digital PDFs, photos of receipts, and any image-based document.

Do I need to create templates for each document type?

No. Parse uses AI to understand document structure automatically. You define a JSON schema describing the fields you want, and the API extracts them from any document — no templates, training data, or configuration files required.

How accurate is the data extraction?

Parse uses large language models to understand document context, not just OCR text matching. This means it handles varied layouts, languages, and formatting. Accuracy depends on document quality, but most structured documents like invoices and receipts achieve high extraction rates.

Is there a free tier for the data extraction API?

Yes. The free plan includes 100 pages per month with full API access, custom schemas, and all supported file formats. No credit card required to start.

How do I handle large volumes of documents?

For batch processing, use the asynchronous extraction endpoint. Submit documents and poll for results, or use webhooks to get notified when extraction is complete. The Pro plan supports 5,000 pages per month with priority processing.

Start Extracting Data Today

Get your API key and extract data from your first document in minutes. 100 pages per month free — no credit card required.

Get Started Free API Reference