Struktur

parse

Convert files to Artifact JSON for inspection or pre-processing.

Synopsis

struktur parse --input <file> [options]
struktur parse --stdin [options]

Description

Converts a file or stdin to Artifact JSON. Use this to:

Inspect

See how Struktur will represent your document before running extraction

Cache

Pre-process files and cache the artifact JSON for repeated extraction

Debug

Debug parser output when configuring a custom parser

Pipeline

Pipe artifacts into extract for decoupled workflows

Options

Input (exactly one required)

Prop

Type

Output

Prop

Type

Parser control

Prop

Type

Image extraction (PDF inputs)

Prop

Type

Parser resolution order

  1. --parser <pkg> flag — bypasses all config
  2. Parser configured for the detected MIME type (struktur config parsers add ...)
  3. Built-in parser for the MIME type
  4. Error: no parser found — suggests struktur config parsers add

Built-in parsers

MIME typeBehavior
application/pdfPer-page text via pdf-parse. Add --images for embedded images, --screenshots for page renders.
text/*Split on double newlines into content slices.
image/*Single-content artifact with the image as a media item.
application/jsonIf it validates as SerializedArtifact[], passed through unchanged.

Examples

struktur parse --input document.pdf
struktur parse --input slides.pdf --images --screenshots --output artifact.json
struktur parse --input data.xlsx --parser @myorg/xlsx-parser
struktur parse --input doc.pdf --images | \
  struktur extract --artifact-file - --fields "title, author" --model openai/gpt-4o-mini
struktur parse --input doc.pdf | struktur utils artifact-viewer --stdin > viewer.html
open viewer.html

See also

On this page