Struktur
Examples

Shell Pipelines & Patterns

Practical patterns for using Struktur in shell workflows.

Extract from a PDF (via markitdown)

markitdown document.pdf | struktur --stdin --schema schema.json --model openai/gpt-4o-mini

Process a directory of files

find ./invoices -name "*.pdf" -print0 | while IFS= read -r -d '' f; do
  markitdown "$f" | struktur --stdin --schema invoice.json --model openai/gpt-4o-mini
done | jq -s '.'

Pipe output to Postgres

find ./invoices -name "*.pdf" -exec markitdown {} \; | \
  struktur --stdin --schema invoice.json --model openai/gpt-4o-mini | \
  jq '.line_items[]' | \
  psql mydb -c "COPY line_items FROM STDIN (FORMAT csv)"

Watch a folder for new files (Linux)

inotifywait -m ./incoming -e create -e moved_to |
  while read -r path action file; do
    [[ "$file" == *.pdf ]] && markitdown "$path/$file" | \
      struktur --stdin --schema invoice.json --model openai/gpt-4o-mini \
      >> processed.jsonl
  done

Watch a folder for new files (macOS)

fswatch -o ./incoming | while read f; do
  for file in ./incoming/*; do
    [ -f "$file" ] || continue
    markitdown "$file" | struktur --stdin \
      --schema invoice.json \
      --model openai/gpt-4o-mini \
      --output "processed/$(basename $file).json"
    mv "$file" ./processed/
  done
done

Enrich records from URLs

cat contracts.json | jq -c '.[]' | while read -r row; do
  url=$(echo "$row" | jq -r '.contract_url')
  curl -s "$url" | struktur --stdin \
    --schema-json '{"type":"object","properties":{"start_date":{"type":"string"},"value":{"type":"number"}},"required":["start_date","value"],"additionalProperties":false}' \
    --model openai/gpt-4o-mini | \
  jq --argjson orig "$row" '$orig + .'
done | jq -s '.'

Test a schema against samples

for f in samples/*.pdf; do
  echo "Testing: $f"
  markitdown "$f" | struktur --stdin --schema v2.json --model openai/gpt-4o-mini 2>&1 | \
    jq -e '.' && echo "OK: $f" || echo "FAILED: $f"
done

Save to file instead of stdout

struktur --input report.pdf --schema schema.json --model openai/gpt-4o-mini --output result.json

See also

On this page