Quickstart
Get started with Struktur in under 5 minutes. Extract structured data from any file.
Extract structured data from any file in 3 commands. About 5 minutes.
Using an AI assistant? Point it at https://struktur.sh/llms.txt for LLM-optimized docs, or install the Agent Skill for built-in Struktur knowledge.
Prerequisites
- Node.js 18+ or Bun installed
- An API key from OpenAI, Anthropic, Google, OpenCode, or OpenRouter
Install
npm install -g @struktur/clibun install -g @struktur/cliVerify:
struktur --helpYou should see the usage output.
Configure your API key
Store your API key securely with the CLI:
echo "sk-..." | struktur config providers add openai --token-stdin --defaultThe --default flag automatically queries the provider API and sets the cheapest available model as default, so --model becomes optional in all future commands.
Output: { "provider": "openai", "stored": "keychain" } (or "file" on Linux).
Set a default model (if not using --default)
struktur config models use openai/gpt-4o-miniOnce set, --model is optional in all extract commands.
Extract your first data
echo "Invoice #1042 from Acme Corp. Total: $2,400.00. Due: April 1, 2026." | \
struktur --stdin \
--fields "invoice_number, vendor, total:number, due_date" \
--model openai/gpt-4o-ministruktur --input invoice.pdf \
--fields "invoice_number, vendor, total:number, due_date" \
--model openai/gpt-4o-miniThe --fields flag builds a JSON Schema on the fly. Each field defaults to string; append :number, :integer, :bool, etc. to set the type. See Fields Shorthand for the full syntax.
If you need more control (optional fields, nested objects), pass a full schema instead:
struktur --input invoice.pdf \
--schema-json '{"type":"object","properties":{"invoice_number":{"type":"string"},"vendor":{"type":"string"},"total":{"type":"number"},"due_date":{"type":"string"}},"required":["invoice_number","vendor","total","due_date"],"additionalProperties":false}' \
--model openai/gpt-4o-miniNotice that total is a number, not a string — Struktur enforced the schema.
What happened?
For the text example, stdin input was loaded as an artifact, the LLM generated output, and it was validated against your schema. The simple strategy handled this in a single LLM call.
For the PDF example, an extra step happened first: the built-in PDF parser extracted text (and optionally images) from the file and converted it into an artifact, then extraction proceeded as normal. Add --images to include embedded images, or --screenshots to render page screenshots.
To understand what happened inside, read The Extraction Pipeline.
Where to go next
| Goal | Link |
|---|---|
| Keep learning | The Extraction Pipeline |
| Solve a real problem | Extract Invoice Data |
| Look up all CLI flags | CLI Reference |
| Use it in TypeScript | TypeScript SDK |
| Understand file parsing | Document Parsing |
What is Struktur?
All-in-one tool for structured data extraction using an autonomous agent that turns documents into validated JSON.
What is Structured Data Extraction?
Structured data extraction is the process of converting unstructured documents into validated, typed data using AI and schema validation.