Extractors

Extractors use AI to pull structured data from unstructured documents—PDFs, images, scanned forms, and more. Define what data you need, and the AI finds and extracts it.

How Extraction Works

Input — A document (PDF, image, or scanned file)
Schema — You define what fields to extract
AI Processing — The model reads the document and identifies matching data
Output — Structured data ready for your workflow

The AI handles variations in document layouts, handwriting, and formatting. You don't need to define exact positions or parsing rules.

Defining a Schema

A schema describes what data to extract. Each field has:

Name — The field identifier (e.g., vendor_name)
Type — String, number, boolean, list, or object
Description — Helps the AI understand what to look for

Example: Invoice Extraction

The description is key—it guides the AI. "The total amount due" is better than just "total".

Using the Extract Document Action

Add Extract Document to your workflow
Configure the Document input (from trigger, previous step, or URL)
Define your Schema using the schema builder or JSON
The output is a structured object matching your schema

Access extracted data in subsequent steps:

{{steps.extract.vendor_name}}
{{steps.extract.line_items[0].description}}

Document Types

Extractors handle many document types:

Type	Examples
PDFs	Invoices, contracts, forms, reports
Images	Scanned documents, photos of receipts, screenshots
Scanned documents	Paper forms that were digitized
Spreadsheets	CSV, Excel, Google Sheets

The AI adapts to different layouts automatically. An invoice from Vendor A can look completely different from Vendor B, and extraction still works.

Best Practices

Example: Processing Incoming Invoices

A typical invoice processing workflow:

Email Trigger — Invoice arrives as email attachment
Extract Document — Pull vendor, amount, line items, dates
Intervention — Reviewer verifies extracted data
HTTP Call — Send approved data to accounting system
Fill Document — Generate a payment authorization form

The extraction step turns an unstructured PDF into clean, structured data that flows through your entire process.

Accuracy and Confidence

Extraction accuracy depends on:

Document quality — Clear scans extract better than blurry photos
Schema clarity — Descriptive field names and descriptions help
Document complexity — Simple forms are easier than dense contracts

For critical data, pair extraction with intervention review to catch and correct any errors before they propagate.

Extractors

Write Clear Descriptions

Handle Optional Fields

Use Arrays for Repeating Data

Verify with Interventions

On this page