parse
Convert files to Artifact JSON for inspection or pre-processing.
Synopsis
struktur parse --input <file> [options]
struktur parse --stdin [options]Description
Converts a file or stdin to Artifact JSON. Use this to:
Inspect
See how Struktur will represent your document before running extraction
Cache
Pre-process files and cache the artifact JSON for repeated extraction
Debug
Debug parser output when configuring a custom parser
Pipeline
Pipe artifacts into extract for decoupled workflows
Options
Input (exactly one required)
Prop
Type
Output
Prop
Type
Parser control
Prop
Type
Image extraction (PDF inputs)
Prop
Type
Parser resolution order
--parser <pkg>flag — bypasses all config- Parser configured for the detected MIME type (
struktur config parsers add ...) - Built-in parser for the MIME type
- Error: no parser found — suggests
struktur config parsers add
Built-in parsers
| MIME type | Behavior |
|---|---|
application/pdf | Per-page text via pdf-parse. Add --images for embedded images, --screenshots for page renders. |
text/* | Split on double newlines into content slices. |
image/* | Single-content artifact with the image as a media item. |
application/json | If it validates as SerializedArtifact[], passed through unchanged. |
Examples
struktur parse --input document.pdfstruktur parse --input slides.pdf --images --screenshots --output artifact.jsonstruktur parse --input data.xlsx --parser @myorg/xlsx-parserstruktur parse --input doc.pdf --images | \
struktur extract --artifact-file - --fields "title, author" --model openai/gpt-4o-ministruktur parse --input doc.pdf | struktur utils artifact-viewer --stdin > viewer.html
open viewer.htmlSee also
- config parsers — Configure custom parsers
- Document Parsing — Parser system overview
- Artifact Format — Output format
- utils artifact-viewer — Visualize parsed artifacts