Struktur

Quickstart

Get started with Struktur in under 5 minutes. Extract structured data from any file.

Extract structured data from any file in 3 commands. About 5 minutes.

Using an AI assistant? Point it at https://struktur.sh/llms.txt for LLM-optimized docs, or install the Agent Skill for built-in Struktur knowledge.

Prerequisites

  • Node.js 18+ or Bun installed
  • An API key from OpenAI, Anthropic, Google, OpenCode, or OpenRouter

Install

npm install -g @struktur/cli
bun install -g @struktur/cli

Verify:

struktur --help

You should see the usage output.

Configure your API key

Store your API key securely with the CLI:

echo "sk-..." | struktur config providers add openai --token-stdin --default

The --default flag automatically queries the provider API and sets the cheapest available model as default, so --model becomes optional in all future commands.

Output: { "provider": "openai", "stored": "keychain" } (or "file" on Linux).

Set a default model (if not using --default)

struktur config models use openai/gpt-4o-mini

Once set, --model is optional in all extract commands.

Extract your first data

echo "Invoice #1042 from Acme Corp. Total: $2,400.00. Due: April 1, 2026." | \
  struktur --stdin \
  --fields "invoice_number, vendor, total:number, due_date" \
  --model openai/gpt-4o-mini
struktur --input invoice.pdf \
  --fields "invoice_number, vendor, total:number, due_date" \
  --model openai/gpt-4o-mini

The --fields flag builds a JSON Schema on the fly. Each field defaults to string; append :number, :integer, :bool, etc. to set the type. See Fields Shorthand for the full syntax.

If you need more control (optional fields, nested objects), pass a full schema instead:

struktur --input invoice.pdf \
  --schema-json '{"type":"object","properties":{"invoice_number":{"type":"string"},"vendor":{"type":"string"},"total":{"type":"number"},"due_date":{"type":"string"}},"required":["invoice_number","vendor","total","due_date"],"additionalProperties":false}' \
  --model openai/gpt-4o-mini

Notice that total is a number, not a string — Struktur enforced the schema.

What happened?

For the text example, stdin input was loaded as an artifact, the LLM generated output, and it was validated against your schema. The simple strategy handled this in a single LLM call.

For the PDF example, an extra step happened first: the built-in PDF parser extracted text (and optionally images) from the file and converted it into an artifact, then extraction proceeded as normal. Add --images to include embedded images, or --screenshots to render page screenshots.

To understand what happened inside, read The Extraction Pipeline.

Where to go next

GoalLink
Keep learningThe Extraction Pipeline
Solve a real problemExtract Invoice Data
Look up all CLI flagsCLI Reference
Use it in TypeScriptTypeScript SDK
Understand file parsingDocument Parsing

On this page