Extractors
Extract structured data from documents using AI
Extractors use AI to pull structured data from unstructured documents—PDFs, images, scanned forms, and more. Define what data you need, and the AI finds and extracts it.
How Extraction Works
- Input — A document (PDF, image, or scanned file)
- Schema — You define what fields to extract
- AI Processing — The model reads the document and identifies matching data
- Output — Structured data ready for your workflow
The AI handles variations in document layouts, handwriting, and formatting. You don't need to define exact positions or parsing rules.
Defining a Schema
A schema describes what data to extract. Each field has:
- Name — The field identifier (e.g.,
vendor_name) - Type — String, number, boolean, list, or object
- Description — Helps the AI understand what to look for
Example: Invoice Extraction
The description is key—it guides the AI. "The total amount due" is better than just "total".
Using the Extract Document Action
- Add Extract Document to your workflow
- Configure the Document input (from trigger, previous step, or URL)
- Define your Schema using the schema builder or JSON
- The output is a structured object matching your schema
Access extracted data in subsequent steps:
{
{
steps.extract.vendor_name;
}
}
{
{
steps.extract.line_items[0].description;
}
}Document Types
Extractors handle many document types:
| Type | Examples |
|---|---|
| PDFs | Invoices, contracts, forms, reports |
| Images | Scanned documents, photos of receipts, screenshots |
| Scanned documents | Paper forms that were digitized |
| Spreadsheets | CSV, Excel, Google Sheets |
The AI adapts to different layouts automatically. An invoice from Vendor A can look completely different from Vendor B, and extraction still works.
Best Practices
Using Extractors via the API
You can run extractions programmatically using the Flint API with an API key. This is useful for integrating extraction into your own applications, scripts, or external systems without building a workflow.
Authentication
Include your API key in the Authorization header. See API Keys for how to create one with extractions:create permission.
Running an Extraction
Send a POST request to /extractions with your document file IDs and a JSON schema defining the fields to extract:
curl -X POST "https://steel.flint.com/extractions" \
-H "Authorization: Bearer your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"fileIds": ["file_abc123"],
"schema": {
"type": "object",
"properties": {
"vendor_name": {
"type": "string",
"description": "The name of the vendor or supplier"
},
"invoice_number": {
"type": "string",
"description": "The unique invoice identifier"
},
"total_amount": {
"type": "number",
"description": "The total amount due including tax"
},
"line_items": {
"type": "array",
"description": "Individual items on the invoice",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"price": { "type": "number" }
}
}
}
},
"required": ["vendor_name", "total_amount"]
}
}'The response includes the extracted data and a confidence score:
{
"id": "ext_123",
"status": "success",
"result": {
"vendor_name": "Acme Corp",
"invoice_number": "INV-2025-0042",
"total_amount": 1250.00,
"line_items": [
{ "description": "Widget A", "quantity": 5, "price": 250.00 }
]
},
"confidence": 3
}Retrieving Extraction Results
Fetch a specific extraction by ID:
curl "https://steel.flint.com/extractions/ext_123" \
-H "Authorization: Bearer your-api-key-here"Or list all extractions:
curl "https://steel.flint.com/extractions" \
-H "Authorization: Bearer your-api-key-here"A JSON Schema is a standard format (json-schema.org) for describing the structure of JSON data. It defines what fields to expect, their types (string, number, boolean, array, object), which fields are required, and descriptions of each field. Flint uses JSON Schema to tell the AI exactly what data to extract from your documents. You define the shape of the output you want, and the AI fills it in.
Example: Processing Incoming Invoices
A typical invoice processing workflow:
- Email Trigger — Invoice arrives as email attachment
- Extract Document — Pull vendor, amount, line items, dates
- Intervention — Reviewer verifies extracted data
- HTTP Call — Send approved data to accounting system
- Fill Document — Generate a payment authorization form
The extraction step turns an unstructured PDF into clean, structured data that flows through your entire process.
Accuracy and Confidence
Extraction accuracy depends on:
- Document quality — Clear scans extract better than blurry photos
- Schema clarity — Descriptive field names and descriptions help
- Document complexity — Simple forms are easier than dense contracts
For critical data, pair extraction with intervention review to catch and correct any errors before they propagate.