Get started with Struktur in under 5 minutes. Extract structured data from any file.

Extract structured data from any file in 3 commands. About 5 minutes.

Using an AI assistant? Point it at https://struktur.sh/llms.txt for LLM-optimized docs, or install the Agent Skill for built-in Struktur knowledge.

Prerequisites

Node.js 18+ or Bun installed
An API key from OpenAI, Anthropic, Google, OpenCode, or OpenRouter

Configure your API key

Store your API key securely with the CLI:

echo "sk-..." | struktur config providers add openai --token-stdin --default

The --default flag automatically queries the provider API and sets the cheapest available model as default, so --model becomes optional in all future commands.

Output: { "provider": "openai", "stored": "keychain" } (or "file" on Linux).

Set a default model (if not using --default)

struktur config models use openai/gpt-4o-mini

Once set, --model is optional in all extract commands.

Extract your first data

echo "Invoice #1042 from Acme Corp. Total: $2,400.00. Due: April 1, 2026." | \
  struktur --stdin \
  --fields "invoice_number, vendor, total:number, due_date" \
  --model openai/gpt-4o-mini

struktur --input invoice.pdf \
  --fields "invoice_number, vendor, total:number, due_date" \
  --model openai/gpt-4o-mini

The --fields flag builds a JSON Schema on the fly. Each field defaults to string; append :number, :integer, :bool, etc. to set the type. See Fields Shorthand for the full syntax.

If you need more control (optional fields, nested objects), pass a full schema instead:

struktur --input invoice.pdf \
  --schema-json '{"type":"object","properties":{"invoice_number":{"type":"string"},"vendor":{"type":"string"},"total":{"type":"number"},"due_date":{"type":"string"}},"required":["invoice_number","vendor","total","due_date"],"additionalProperties":false}' \
  --model openai/gpt-4o-mini

Notice that total is a number, not a string — Struktur enforced the schema.

What happened?

For the text example, stdin input was loaded as an artifact, the LLM generated output, and it was validated against your schema. The simple strategy handled this in a single LLM call.

For the PDF example, an extra step happened first: the built-in PDF parser extracted text (and optionally images) from the file and converted it into an artifact, then extraction proceeded as normal. Add --images to include embedded images, or --screenshots to render page screenshots.

To understand what happened inside, read The Extraction Pipeline.

Where to go next

Goal	Link
Keep learning	The Extraction Pipeline
Solve a real problem	Extract Invoice Data
Look up all CLI flags	CLI Reference
Use it in TypeScript	TypeScript SDK
Understand file parsing	Document Parsing

Quickstart