Extractors
Extract structured data from documents using AI
Extractors use AI to pull structured data from unstructured documents—PDFs, images, scanned forms, and more. Define what data you need, and the AI finds and extracts it.
How Extraction Works
- Input — A document (PDF, image, or scanned file)
- Schema — You define what fields to extract
- AI Processing — The model reads the document and identifies matching data
- Output — Structured data ready for your workflow
The AI handles variations in document layouts, handwriting, and formatting. You don't need to define exact positions or parsing rules.
Defining a Schema
A schema describes what data to extract. Each field has:
- Name — The field identifier (e.g.,
vendor_name) - Type — String, number, boolean, list, or object
- Description — Helps the AI understand what to look for
Example: Invoice Extraction
The description is key—it guides the AI. "The total amount due" is better than just "total".
Using the Extract Document Action
- Add Extract Document to your workflow
- Configure the Document input (from trigger, previous step, or URL)
- Define your Schema using the schema builder or JSON
- The output is a structured object matching your schema
Access extracted data in subsequent steps:
{{steps.extract.vendor_name}}
{{steps.extract.line_items[0].description}}Document Types
Extractors handle many document types:
| Type | Examples |
|---|---|
| PDFs | Invoices, contracts, forms, reports |
| Images | Scanned documents, photos of receipts, screenshots |
| Scanned documents | Paper forms that were digitized |
| Spreadsheets | CSV, Excel, Google Sheets |
The AI adapts to different layouts automatically. An invoice from Vendor A can look completely different from Vendor B, and extraction still works.
Best Practices
Example: Processing Incoming Invoices
A typical invoice processing workflow:
- Email Trigger — Invoice arrives as email attachment
- Extract Document — Pull vendor, amount, line items, dates
- Intervention — Reviewer verifies extracted data
- HTTP Call — Send approved data to accounting system
- Fill Document — Generate a payment authorization form
The extraction step turns an unstructured PDF into clean, structured data that flows through your entire process.
Accuracy and Confidence
Extraction accuracy depends on:
- Document quality — Clear scans extract better than blurry photos
- Schema clarity — Descriptive field names and descriptions help
- Document complexity — Simple forms are easier than dense contracts
For critical data, pair extraction with intervention review to catch and correct any errors before they propagate.