Why validation inside the loop?

Without in-loop validation, you get JSON that may or may not match your schema. You then have to write error handling, decide whether to retry, and figure out how to feed errors back.

Struktur does all of this for you.

How the retry loop works

Send the extraction prompt to the LLM.
Validate the response against the schema.
If valid: return it.
If invalid: serialize the validation errors into an XML block, append it to the message thread as a user message, go to step 1.
After maxAttempts (default 3): throw.

The model sees its own mistake and a structured description of it. This self-correction loop is why most extractions converge within 2 attempts.

Schema design affects retry rate

Well-constrained schemas fail less often. Tips:

Always use additionalProperties: false.
Use required arrays explicitly.
Prefer enum for categorical fields.
Use format (e.g., date, email) only when you need it — it adds validation surface.

Observing retries with events

Use the onMessage event to see when retries happen:

events: {
  onMessage: ({ role, content }) => {
    if (role === "user" && String(content).includes("validation-errors")) {
      console.log("Retry triggered");
    }
  }
}

Smart Validation for Multi-Step Strategies

When using parallel or sequential strategies, your data might be split across multiple chunks. For example, an invoice's price might appear on page 1, while the vendor name appears on page 5. If both fields are required in your schema, validating intermediate results would fail unnecessarily.

How smart validation works

Struktur uses lenient validation during intermediate extraction steps:

Type errors (string vs number) → Retry immediately
Format errors (invalid email) → Retry immediately
Required field errors → Allowed during intermediate steps
All constraints → Enforced on final validation

This means the model can extract partial data without pressure to hallucinate missing required fields. The final validation ensures all required fields are present before returning.

Opting into strict validation

Disable smart validation with the strict flag:

const result = await extract({
  artifacts,
  schema,
  strategy: parallel({
    model: openai("gpt-4o-mini"),
    mergeModel: openai("gpt-4o-mini"),
    strict: true, // Validate required fields on every step
  }),
});

Use strict: true when:

You know each chunk contains complete data
You want early failure on missing fields
You're debugging extraction issues

Validation & Retries