Struktur

extract

Main extraction command for Struktur CLI.

Synopsis

struktur [extract] [options]

extract is the default command — struktur --input file.pdf ... and struktur extract --input file.pdf ... are equivalent.

Input options (exactly one required)

Prop

Type

Schema options (exactly one required)

Prop

Type

--fields is the quickest way to define a schema without writing JSON. See --fields reference for the full syntax.

Model

Prop

Type

Supported providers: openai, anthropic, google, opencode, openrouter.

For OpenRouter, you can specify a preferred inference provider using # syntax:

--model "openrouter/anthropic/claude-3.5-sonnet#cerebras"

Parsing options

These flags control how --input files are parsed before extraction.

Prop

Type

Image options (PDF inputs)

Prop

Type

For custom screenshot dimensions, use struktur parse --screenshots --screenshot-scale <num> and pipe the artifact to struktur extract --artifact-file -.

Strategy

Prop

Type

Strategy names: simple, parallel, sequential, parallelAutoMerge, sequentialAutoMerge, doublePass, doublePassAutoMerge.

When using --strategy other than simple, both model and mergeModel/dedupeModel are set to the same model. For different models per role, use the TypeScript SDK.

Output

Prop

Type

Progress

When stderr is a TTY, a progress bar is shown:

◈ ▰▰▰▰▰▱▱▱▱▱ 50% | batch 2/5

The bar is suppressed in non-interactive mode (piped stderr).

Examples

echo "Invoice #1042 from Acme Corp. Total: $2,400.00." | \
  struktur --stdin -f "invoice_number, vendor, total:number" \
  --model openai/gpt-4o-mini
struktur --input invoice.pdf \
  --fields "invoice_number, vendor, total:number" \
  --model openai/gpt-4o-mini
struktur --input invoice.pdf --images \
  --schema invoice-schema.json \
  --model openai/gpt-4o
# Use parse for custom screenshot settings, then pipe to extract
struktur parse --input slides.pdf --screenshots --screenshot-scale 2 | \
  struktur --artifact-file - \
  --fields "title, slide_count:integer" \
  --model openai/gpt-4o
struktur --input report.txt \
  --schema-json '{"type":"object","properties":{"summary":{"type":"string"}},"required":["summary"],"additionalProperties":false}' \
  --model openai/gpt-4o-mini
cat document.md | struktur --stdin --schema schema.json --model anthropic/claude-3-5-haiku-20241022
struktur --input large.md --schema schema.json --model openai/gpt-4o \
  --strategy parallel --output result.json
struktur --input data.bin --mime application/pdf \
  --fields "title, author" --model openai/gpt-4o-mini
struktur --input report.docx --parser @myorg/docx-parser \
  --fields "title, summary" --model openai/gpt-4o-mini
struktur --input data.txt --schema https://myserver.com/schemas/invoice.json --model openai/gpt-4o-mini
struktur --input doc.pdf --fields "title" --model openai/gpt-4o-mini --debug

See also

On this page