REST API · W-2 & 1099 to JSON · 100 free pages/mo

Tax Form Extraction API

Pull box-level data from W-2s, 1099s, and other tax forms into typed JSON. Employer and recipient details, every numbered box, the form type and tax year - structured and ready for tax-prep or payroll software. No manual keying.

Get an API Key Free Read the Docs

Box Numbers In, Named Fields Out

Declare the boxes you need and each comes back as a typed number under a name you chose - no OCR cleanup, no transcription.

request.sh

curl -X POST \
  https://api-parse.conversiontools.io/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@w2.pdf" \
  -F 'schema={
    "form_type": "string",
    "tax_year": "number",
    "employer": {
      "name": "string",
      "ein": "string",
      "address": "string"
    },
    "employee": {
      "name": "string",
      "ssn_last4": "string"
    },
    "wages": "number",
    "federal_income_tax_withheld": "number",
    "social_security_wages": "number",
    "social_security_tax_withheld": "number",
    "medicare_wages": "number",
    "medicare_tax_withheld": "number",
    "state": "string",
    "state_wages": "number",
    "state_income_tax": "number"
  }'

response.json

{
  "status": "completed",
  "pages": 1,
  "data": {
    "form_type": "W-2",
    "tax_year": 2025,
    "employer": {
      "name": "ACME Corp",
      "ein": "12-3456789",
      "address": "100 Main St, Austin, TX"
    },
    "employee": {
      "name": "Jane Doe",
      "ssn_last4": "6789"
    },
    "wages": 84500.00,
    "federal_income_tax_withheld": 11230.00,
    "social_security_wages": 84500.00,
    "social_security_tax_withheld": 5239.00,
    "medicare_wages": 84500.00,
    "medicare_tax_withheld": 1225.25,
    "state": "TX",
    "state_wages": 84500.00,
    "state_income_tax": 0.00
  }
}

Built for Tax & Payroll Software

Tax forms are dense grids of numbered boxes. The schema turns them into named, typed fields.

Box-level precision

Each numbered box maps to a named field, typed as a number, so totals are correct without an OCR cleanup pass.

W-2, 1099, and beyond

One schema-driven endpoint across form types. Declare the fields you need per form - W-2, 1099-NEC, 1099-MISC, 1098, and more.

Masked PII by design

Capture only what you need, such as the last four of an SSN, and every file is deleted automatically within 24 hours.

Frequently Asked Questions

Which tax forms does the API support?

W-2, 1099-NEC, 1099-MISC, 1099-INT, 1099-DIV, 1098, 1040, and others. The API is schema-driven, so you describe the fields for whatever form you send.

How do form box numbers map to my output?

You name the fields in your schema (for example federal_income_tax_withheld), and the model maps the right box to each name. You never deal with raw box positions.

Does it read scanned or photographed forms?

Yes. OCR is applied automatically. Printed forms read most accurately; clear handwriting is generally recognized, though heavily stylized handwriting can be lower accuracy.

Can it handle seasonal batches of forms?

Yes. Use the asynchronous endpoint to submit large batches during tax season and poll or use webhooks for results.

How is sensitive tax data protected?

Uploaded files are deleted automatically within 24 hours. Extracted data is encrypted in transit and at rest and is never used to train models.

Stop Keying Tax Forms by Hand

Free tier covers your first 100 pages a month. No credit card to start.

Get Started Free API Reference

REST API · W-2 & 1099 to JSON · 100 free pages/mo

Tax Form Extraction API

Get an API Key Free Read the Docs

Box Numbers In, Named Fields Out

Declare the boxes you need and each comes back as a typed number under a name you chose - no OCR cleanup, no transcription.

request.sh

curl -X POST \
  https://api-parse.conversiontools.io/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@w2.pdf" \
  -F 'schema={
    "form_type": "string",
    "tax_year": "number",
    "employer": {
      "name": "string",
      "ein": "string",
      "address": "string"
    },
    "employee": {
      "name": "string",
      "ssn_last4": "string"
    },
    "wages": "number",
    "federal_income_tax_withheld": "number",
    "social_security_wages": "number",
    "social_security_tax_withheld": "number",
    "medicare_wages": "number",
    "medicare_tax_withheld": "number",
    "state": "string",
    "state_wages": "number",
    "state_income_tax": "number"
  }'

response.json

{
  "status": "completed",
  "pages": 1,
  "data": {
    "form_type": "W-2",
    "tax_year": 2025,
    "employer": {
      "name": "ACME Corp",
      "ein": "12-3456789",
      "address": "100 Main St, Austin, TX"
    },
    "employee": {
      "name": "Jane Doe",
      "ssn_last4": "6789"
    },
    "wages": 84500.00,
    "federal_income_tax_withheld": 11230.00,
    "social_security_wages": 84500.00,
    "social_security_tax_withheld": 5239.00,
    "medicare_wages": 84500.00,
    "medicare_tax_withheld": 1225.25,
    "state": "TX",
    "state_wages": 84500.00,
    "state_income_tax": 0.00
  }
}

Built for Tax & Payroll Software

Tax forms are dense grids of numbered boxes. The schema turns them into named, typed fields.

Box-level precision

Each numbered box maps to a named field, typed as a number, so totals are correct without an OCR cleanup pass.

W-2, 1099, and beyond

One schema-driven endpoint across form types. Declare the fields you need per form - W-2, 1099-NEC, 1099-MISC, 1098, and more.

Masked PII by design

Capture only what you need, such as the last four of an SSN, and every file is deleted automatically within 24 hours.

Frequently Asked Questions

Which tax forms does the API support?

W-2, 1099-NEC, 1099-MISC, 1099-INT, 1099-DIV, 1098, 1040, and others. The API is schema-driven, so you describe the fields for whatever form you send.

How do form box numbers map to my output?

You name the fields in your schema (for example federal_income_tax_withheld), and the model maps the right box to each name. You never deal with raw box positions.

Does it read scanned or photographed forms?

Yes. OCR is applied automatically. Printed forms read most accurately; clear handwriting is generally recognized, though heavily stylized handwriting can be lower accuracy.

Can it handle seasonal batches of forms?

Yes. Use the asynchronous endpoint to submit large batches during tax season and poll or use webhooks for results.

How is sensitive tax data protected?

Uploaded files are deleted automatically within 24 hours. Extracted data is encrypted in transit and at rest and is never used to train models.

Stop Keying Tax Forms by Hand

Free tier covers your first 100 pages a month. No credit card to start.

Get Started Free API Reference

Tax Form Extraction API

Box Numbers In, Named Fields Out

Built for Tax & Payroll Software

Box-level precision

W-2, 1099, and beyond

Masked PII by design

Frequently Asked Questions

Which tax forms does the API support?

How do form box numbers map to my output?

Does it read scanned or photographed forms?

Can it handle seasonal batches of forms?

How is sensitive tax data protected?

More document extraction use cases

Data Extraction API

PDF Parsing API

Invoice extraction

Receipt parsing

Purchase order extraction

Bank statement to JSON

Contract data extraction

Bill of lading extraction

Stop Keying Tax Forms by Hand

Tax Form Extraction API

Box Numbers In, Named Fields Out

Built for Tax & Payroll Software

Box-level precision

W-2, 1099, and beyond

Masked PII by design

Frequently Asked Questions

Which tax forms does the API support?

How do form box numbers map to my output?

Does it read scanned or photographed forms?

Can it handle seasonal batches of forms?

How is sensitive tax data protected?

More document extraction use cases

Data Extraction API

PDF Parsing API

Invoice extraction

Receipt parsing

Purchase order extraction

Bank statement to JSON

Contract data extraction

Bill of lading extraction

Stop Keying Tax Forms by Hand