Extractors

Extract structured data from documents using AI

Extractors use AI to pull structured data from unstructured documents—PDFs, images, scanned forms, and more. Define what data you need, and the AI finds and extracts it.

How Extraction Works

  1. Input — A document (PDF, image, or scanned file)
  2. Schema — You define what fields to extract
  3. AI Processing — The model reads the document and identifies matching data
  4. Output — Structured data ready for your workflow

The AI handles variations in document layouts, handwriting, and formatting. You don't need to define exact positions or parsing rules.

Defining a Schema

A schema describes what data to extract. Each field has:

  • Name — The field identifier (e.g., vendor_name)
  • Type — String, number, boolean, list, or object
  • Description — Helps the AI understand what to look for

Example: Invoice Extraction

Extractor schema builder

The description is key—it guides the AI. "The total amount due" is better than just "total".

Using the Extract Document Action

  1. Add Extract Document to your workflow
  2. Configure the Document input (from trigger, previous step, or URL)
  3. Define your Schema using the schema builder or JSON
  4. The output is a structured object matching your schema

Access extracted data in subsequent steps:

{{steps.extract.vendor_name}}
{{steps.extract.line_items[0].description}}

Document Types

Extractors handle many document types:

TypeExamples
PDFsInvoices, contracts, forms, reports
ImagesScanned documents, photos of receipts, screenshots
Scanned documentsPaper forms that were digitized
SpreadsheetsCSV, Excel, Google Sheets

The AI adapts to different layouts automatically. An invoice from Vendor A can look completely different from Vendor B, and extraction still works.

Best Practices

Example: Processing Incoming Invoices

A typical invoice processing workflow:

  1. Email Trigger — Invoice arrives as email attachment
  2. Extract Document — Pull vendor, amount, line items, dates
  3. Intervention — Reviewer verifies extracted data
  4. HTTP Call — Send approved data to accounting system
  5. Fill Document — Generate a payment authorization form

The extraction step turns an unstructured PDF into clean, structured data that flows through your entire process.

Accuracy and Confidence

Extraction accuracy depends on:

  • Document quality — Clear scans extract better than blurry photos
  • Schema clarity — Descriptive field names and descriptions help
  • Document complexity — Simple forms are easier than dense contracts

For critical data, pair extraction with intervention review to catch and correct any errors before they propagate.

On this page