Pull box-level data from W-2s, 1099s, and other tax forms into typed JSON. Employer and recipient details, every numbered box, the form type and tax year - structured and ready for tax-prep or payroll software. No manual keying.
Declare the boxes you need and each comes back as a typed number under a name you chose - no OCR cleanup, no transcription.
curl -X POST \
https://api-parse.conversiontools.io/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@w2.pdf" \
-F 'schema={
"form_type": "string",
"tax_year": "number",
"employer": {
"name": "string",
"ein": "string",
"address": "string"
},
"employee": {
"name": "string",
"ssn_last4": "string"
},
"wages": "number",
"federal_income_tax_withheld": "number",
"social_security_wages": "number",
"social_security_tax_withheld": "number",
"medicare_wages": "number",
"medicare_tax_withheld": "number",
"state": "string",
"state_wages": "number",
"state_income_tax": "number"
}'{
"status": "completed",
"pages": 1,
"data": {
"form_type": "W-2",
"tax_year": 2025,
"employer": {
"name": "ACME Corp",
"ein": "12-3456789",
"address": "100 Main St, Austin, TX"
},
"employee": {
"name": "Jane Doe",
"ssn_last4": "6789"
},
"wages": 84500.00,
"federal_income_tax_withheld": 11230.00,
"social_security_wages": 84500.00,
"social_security_tax_withheld": 5239.00,
"medicare_wages": 84500.00,
"medicare_tax_withheld": 1225.25,
"state": "TX",
"state_wages": 84500.00,
"state_income_tax": 0.00
}
}Tax forms are dense grids of numbered boxes. The schema turns them into named, typed fields.
Each numbered box maps to a named field, typed as a number, so totals are correct without an OCR cleanup pass.
One schema-driven endpoint across form types. Declare the fields you need per form - W-2, 1099-NEC, 1099-MISC, 1098, and more.
Capture only what you need, such as the last four of an SSN, and every file is deleted automatically within 24 hours.
W-2, 1099-NEC, 1099-MISC, 1099-INT, 1099-DIV, 1098, 1040, and others. The API is schema-driven, so you describe the fields for whatever form you send.
You name the fields in your schema (for example federal_income_tax_withheld), and the model maps the right box to each name. You never deal with raw box positions.
Yes. OCR is applied automatically. Printed forms read most accurately; clear handwriting is generally recognized, though heavily stylized handwriting can be lower accuracy.
Yes. Use the asynchronous endpoint to submit large batches during tax season and poll or use webhooks for results.
Uploaded files are deleted automatically within 24 hours. Extracted data is encrypted in transit and at rest and is never used to train models.
The same schema-driven API works across every document type. Define a schema once, extract from thousands of files.
Extract structured JSON from any document with custom schemas.
Parse any PDF into structured JSON, scanned or digital.
Vendor, line items, totals, tax, and dates from invoices.
Store, items, totals, and payment method from receipts.
PO number, vendor, buyer, and SKU-level line items.
Transactions, running balances, and dates for reconciliation.
Parties, dates, governing law, and key clauses.
Carrier, parties, ports, containers, and cargo.
Free tier covers your first 100 pages a month. No credit card to start.