Examples
Watch a Folder for New Files
Process files as they arrive in a folder.
Linux: inotifywait
For PDFs and other supported formats, use --input directly — no pre-processing required:
inotifywait -m ./incoming -e create -e moved_to |
while read -r path action file; do
echo "New file: $file"
struktur --input "$path/$file" \
--schema schema.json \
--model openai/gpt-4o-mini \
--output "processed/$file.json"
mv "$path/$file" ./processed/
doneFor formats without a built-in parser, pipe through a conversion tool first:
inotifywait -m ./incoming -e create -e moved_to |
while read -r path action file; do
echo "New file: $file"
markitdown "$path/$file" | struktur --stdin \
--schema schema.json \
--model openai/gpt-4o-mini \
--output "processed/$file.json"
mv "$path/$file" ./processed/
donemacOS: fswatch
fswatch -o ./incoming | while read f; do
for file in ./incoming/*; do
[ -f "$file" ] || continue
echo "Processing: $file"
struktur --input "$file" \
--schema schema.json \
--model openai/gpt-4o-mini \
--output "processed/$(basename $file).json"
mv "$file" ./processed/
done
doneOutput to JSONL
For streaming ingestion, append to a JSONL file (one JSON object per line):
inotifywait -m ./incoming -e create -e moved_to |
while read -r path action file; do
struktur --input "$path/$file" \
--schema schema.json \
--model openai/gpt-4o-mini \
>> processed.jsonl
mv "$path/$file" ./processed/
doneSDK: fs.watch
import { watch } from "node:fs";
import { extract, simple, parse } from "@struktur/sdk";
import { openai } from "@ai-sdk/openai";
import fs from "node:fs/promises";
import path from "node:path";
const schema = /* your schema */;
const incomingDir = "./incoming";
const processedDir = "./processed";
await fs.mkdir(processedDir, { recursive: true });
const watcher = watch(incomingDir, async (event, filename) => {
if (!filename || event !== "rename") return;
const filePath = path.join(incomingDir, filename);
try {
await fs.access(filePath);
} catch {
return; // File was deleted, not created
}
console.log(`Processing: ${filename}`);
try {
// parse handles MIME detection and parsing (PDF, text, images, etc.)
const artifacts = await parse({ kind: "file", path: filePath });
const result = await extract({
artifacts,
schema,
strategy: simple({ model: openai("gpt-4o-mini") }),
});
const outputPath = path.join(processedDir, `${filename}.json`);
await fs.writeFile(outputPath, JSON.stringify(result.data, null, 2));
await fs.unlink(filePath);
console.log(` ✓ Processed: ${filename}`);
} catch (error) {
console.error(` ✗ Failed: ${error.message}`);
}
});
console.log(`Watching ${incomingDir}...`);SDK: chokidar
For more robust file watching:
import chokidar from "chokidar";
import { extract, simple, parse } from "@struktur/sdk";
import { openai } from "@ai-sdk/openai";
import fs from "node:fs/promises";
import path from "node:path";
const schema = /* your schema */;
const watcher = chokidar.watch("./incoming", {
ignored: /(^|[\/\\])\../,
persistent: true,
awaitWriteFinish: {
stabilityThreshold: 2000,
pollInterval: 100
},
});
watcher.on("add", async (filePath) => {
console.log(`Processing: ${path.basename(filePath)}`);
try {
const artifacts = await parse({ kind: "file", path: filePath });
const result = await extract({
artifacts,
schema,
strategy: simple({ model: openai("gpt-4o-mini") }),
});
const outputPath = `./processed/${path.basename(filePath)}.json`;
await fs.writeFile(outputPath, JSON.stringify(result.data, null, 2));
await fs.unlink(filePath);
console.log(` ✓ Processed`);
} catch (error) {
console.error(` ✗ Failed: ${error.message}`);
}
});
console.log("Watching ./incoming...");See also
- Process a Directory of Files — batch processing
- Shell Pipelines & Patterns — more shell patterns
- Extraction Strategies — strategy reference