Struktur

Validation & Retries

How the schema validation loop works and why it matters.

Why validation inside the loop?

Without in-loop validation, you get JSON that may or may not match your schema. You then have to write error handling, decide whether to retry, and figure out how to feed errors back.

Struktur does all of this for you.

How the retry loop works

  1. Send the extraction prompt to the LLM.
  2. Validate the response against the schema.
  3. If valid: return it.
  4. If invalid: serialize the validation errors into an XML block, append it to the message thread as a user message, go to step 1.
  5. After maxAttempts (default 3): throw.

The model sees its own mistake and a structured description of it. This self-correction loop is why most extractions converge within 2 attempts.

Schema design affects retry rate

Well-constrained schemas fail less often. Tips:

  • Always use additionalProperties: false.
  • Use required arrays explicitly.
  • Prefer enum for categorical fields.
  • Use format (e.g., date, email) only when you need it — it adds validation surface.

Observing retries with events

Use the onMessage event to see when retries happen:

events: {
  onMessage: ({ role, content }) => {
    if (role === "user" && String(content).includes("validation-errors")) {
      console.log("Retry triggered");
    }
  }
}

Smart Validation for Multi-Step Strategies

When using parallel or sequential strategies, your data might be split across multiple chunks. For example, an invoice's price might appear on page 1, while the vendor name appears on page 5. If both fields are required in your schema, validating intermediate results would fail unnecessarily.

How smart validation works

Struktur uses lenient validation during intermediate extraction steps:

  • Type errors (string vs number) → Retry immediately
  • Format errors (invalid email) → Retry immediately
  • Required field errors → Allowed during intermediate steps
  • All constraints → Enforced on final validation

This means the model can extract partial data without pressure to hallucinate missing required fields. The final validation ensures all required fields are present before returning.

Opting into strict validation

Disable smart validation with the strict flag:

const result = await extract({
  artifacts,
  schema,
  strategy: parallel({
    model: openai("gpt-4o-mini"),
    mergeModel: openai("gpt-4o-mini"),
    strict: true, // Validate required fields on every step
  }),
});

Use strict: true when:

  • You know each chunk contains complete data
  • You want early failure on missing fields
  • You're debugging extraction issues

See also

On this page