Validation & Retries
How the schema validation loop works and why it matters.
Why validation inside the loop?
Without in-loop validation, you get JSON that may or may not match your schema. You then have to write error handling, decide whether to retry, and figure out how to feed errors back.
Struktur does all of this for you.
How the retry loop works
- Send the extraction prompt to the LLM.
- Validate the response against the schema.
- If valid: return it.
- If invalid: serialize the validation errors into an XML block, append it to the message thread as a user message, go to step 1.
- After
maxAttempts(default 3): throw.
The model sees its own mistake and a structured description of it. This self-correction loop is why most extractions converge within 2 attempts.
Schema design affects retry rate
Well-constrained schemas fail less often. Tips:
- Always use
additionalProperties: false. - Use
requiredarrays explicitly. - Prefer
enumfor categorical fields. - Use
format(e.g.,date,email) only when you need it — it adds validation surface.
Observing retries with events
Use the onMessage event to see when retries happen:
events: {
onMessage: ({ role, content }) => {
if (role === "user" && String(content).includes("validation-errors")) {
console.log("Retry triggered");
}
}
}Smart Validation for Multi-Step Strategies
When using parallel or sequential strategies, your data might be split across multiple chunks. For example, an invoice's price might appear on page 1, while the vendor name appears on page 5. If both fields are required in your schema, validating intermediate results would fail unnecessarily.
How smart validation works
Struktur uses lenient validation during intermediate extraction steps:
- Type errors (
stringvsnumber) → Retry immediately - Format errors (invalid email) → Retry immediately
- Required field errors → Allowed during intermediate steps
- All constraints → Enforced on final validation
This means the model can extract partial data without pressure to hallucinate missing required fields. The final validation ensures all required fields are present before returning.
Opting into strict validation
Disable smart validation with the strict flag:
const result = await extract({
artifacts,
schema,
strategy: parallel({
model: openai("gpt-4o-mini"),
mergeModel: openai("gpt-4o-mini"),
strict: true, // Validate required fields on every step
}),
});Use strict: true when:
- You know each chunk contains complete data
- You want early failure on missing fields
- You're debugging extraction issues
See also
- The Extraction Pipeline — where validation fits
- Events & Observability — the events API
- The Artifact Format — schema format