Extraction Lifecycle
How data flows through Struktur from input to output.
flowchart LR
A[Input] --> B[Parse]
B --> C[Artifacts]
C --> D[Strategy]
D --> E[Output]
subgraph StrategyInternals [Strategy]
direction TB
D1[Chunking] --> D2[LLM Calls]
D2 --> D3[Validation + Retry]
D3 --> D4[Merge/Dedupe]
end
D --> StrategyInternals --> EInputs and Artifacts
Struktur converts input files into Artifacts before extraction. For plain text or stdin, this is trivial. For structured files (PDFs, Office documents), Struktur runs a parser — built-in or custom — that extracts text and images per-page.
Document Parsing
Learn how files are converted to artifacts
Artifact Format
Understand the artifact data structure
The Strategy layer
A strategy is the orchestration engine. It decides how to split the input, how many LLM calls to make, whether to run them concurrently or sequentially, and how to combine results.
Built-in strategies cover the common patterns. You can also write your own.
See Strategies for the complete strategy reference.
Validation inside the loop
The validation loop is a key differentiator. Every LLM response is validated against the schema before the strategy considers it done. If validation fails, the errors are serialized and sent back to the model as a follow-up message.
Smart validation: For multi-step strategies (parallel, sequential, double-pass), Struktur uses lenient validation during intermediate steps—required field violations are allowed until the final step. This prevents false failures when data is split across chunks. Use the strict option to disable this behavior.
Most extractions converge within two attempts. This happens inside the strategy, not as a post-processing step.
Default: maxAttempts = 3.
See Validation & Retries for the validation concept.
The result
Prop
Type
See also
- Document Parsing — how files are converted to artifacts
- Artifacts — the input format
- Strategies — orchestration patterns
- Chunking & Token Budgets — how large documents are split
- Validation & Retries — the retry loop