Struktur
Examples

Extract Invoice Data

Extract structured invoice data from PDF or text files.

Schema

Invoice Schemaobject
9 props3 required
Schema for extracting structured data from invoice documents
Unique identifier for the invoice
Final amount due (subtotal + tax)
Company or individual issuing the invoice
Currency code for all monetary values
"USD""EUR""GBP""JPY"
Payment due date
Date the invoice was issued
List of items or services billed
Sum of all line item totals before tax
Tax amount applied to the subtotal
Example
{
  "invoice_number": "string",
  "vendor": "string",
  "total": 0
}

CLI approach

Single invoice:

struktur --input invoice.pdf \
  --schema invoice-schema.json \
  --model openai/gpt-4o-mini

With embedded images (for invoices with stamps, logos, or handwritten amounts):

struktur --input invoice.pdf --images \
  --schema invoice-schema.json \
  --model openai/gpt-4o

Multiple invoices:

for file in invoices/*.pdf; do
  struktur --input "$file" \
    --schema invoice-schema.json \
    --model openai/gpt-4o-mini \
    --output "outputs/$(basename $file .pdf).json"
done

SDK

Small invoices (1-3 pages):

import { extract, simple, parse } from "@struktur/sdk";
import { openai } from "@ai-sdk/openai";

const artifacts = await parse(
  { kind: "file", path: "invoice.pdf" },
  { includeImages: true }
);

const result = await extract({
  artifacts,
  schema: invoiceSchema,
  strategy: simple({ model: openai("gpt-4o-mini") }),
});

Multi-page invoices with many line items:

import { extract, sequentialAutoMerge, parse } from "@struktur/sdk";
import { openai } from "@ai-sdk/openai";

const artifacts = await parse({ kind: "file", path: "invoice.pdf" });

const result = await extract({
  artifacts,
  schema: invoiceSchema,
  strategy: sequentialAutoMerge({
    model: openai("gpt-4o-mini"),
    dedupeModel: openai("gpt-4o-mini"),
    chunkSize: 8000,
  }),
});

Strategy choice

Invoice typeStrategy
1-3 pagessimple
Multi-page, line items may duplicatesequentialAutoMerge
Many invoices in parallelparallelAutoMerge

Expected output

{
  "invoice_number": "1042",
  "vendor": "Acme Corp",
  "invoice_date": "2024-03-01",
  "due_date": "2024-04-01",
  "currency": "USD",
  "line_items": [
    { "description": "Widget A", "quantity": 10, "unit_price": 50, "total": 500 },
    { "description": "Widget B", "quantity": 5, "unit_price": 200, "total": 1000 }
  ],
  "subtotal": 1500,
  "tax": 150,
  "total": 1650
}

See also

On this page