extract
Main extraction command for Struktur CLI.
Synopsis
struktur [extract] [options]extract is the default command — struktur --input file.pdf ... and struktur extract --input file.pdf ... are equivalent.
Input options (exactly one required)
Prop
Type
Schema options (exactly one required)
Prop
Type
--fields is the quickest way to define a schema without writing JSON. See --fields reference for the full syntax.
Model
Prop
Type
Supported providers: openai, anthropic, google, opencode, openrouter.
For OpenRouter, you can specify a preferred inference provider using # syntax:
--model "openrouter/anthropic/claude-3.5-sonnet#cerebras"Parsing options
These flags control how --input files are parsed before extraction.
Prop
Type
Image options (PDF inputs)
Prop
Type
For custom screenshot dimensions, use struktur parse --screenshots --screenshot-scale <num> and pipe the artifact to struktur extract --artifact-file -.
Strategy
Prop
Type
Strategy names: simple, parallel, sequential, parallelAutoMerge, sequentialAutoMerge, doublePass, doublePassAutoMerge.
When using --strategy other than simple, both model and mergeModel/dedupeModel are set to the same model. For different models per role, use the TypeScript SDK.
Output
Prop
Type
Progress
When stderr is a TTY, a progress bar is shown:
◈ ▰▰▰▰▰▱▱▱▱▱ 50% | batch 2/5The bar is suppressed in non-interactive mode (piped stderr).
Examples
echo "Invoice #1042 from Acme Corp. Total: $2,400.00." | \
struktur --stdin -f "invoice_number, vendor, total:number" \
--model openai/gpt-4o-ministruktur --input invoice.pdf \
--fields "invoice_number, vendor, total:number" \
--model openai/gpt-4o-ministruktur --input invoice.pdf --images \
--schema invoice-schema.json \
--model openai/gpt-4o# Use parse for custom screenshot settings, then pipe to extract
struktur parse --input slides.pdf --screenshots --screenshot-scale 2 | \
struktur --artifact-file - \
--fields "title, slide_count:integer" \
--model openai/gpt-4ostruktur --input report.txt \
--schema-json '{"type":"object","properties":{"summary":{"type":"string"}},"required":["summary"],"additionalProperties":false}' \
--model openai/gpt-4o-minicat document.md | struktur --stdin --schema schema.json --model anthropic/claude-3-5-haiku-20241022struktur --input large.md --schema schema.json --model openai/gpt-4o \
--strategy parallel --output result.jsonstruktur --input data.bin --mime application/pdf \
--fields "title, author" --model openai/gpt-4o-ministruktur --input report.docx --parser @myorg/docx-parser \
--fields "title, summary" --model openai/gpt-4o-ministruktur --input data.txt --schema https://myserver.com/schemas/invoice.json --model openai/gpt-4o-ministruktur --input doc.pdf --fields "title" --model openai/gpt-4o-mini --debugSee also
- --fields reference — fields shorthand syntax and examples
- parse — convert files to artifact JSON for inspection
- config — provider and model management
- Document Parsing — how file parsing works
- Strategies — strategy reference