Platform > Data Layer > File Parsing
File parsing.
Extract structured, actionable data from any file type - PDFs, Word docs, CSVs, emails + more - automatically, at the point of ingestion.

CAPABILITY OVERVIEW
Turn files into data, automatically,
Rayven parses any file format at the point of ingestion - converting unstructured content into structured, workflow-ready data without manual extraction or pre-processing.
AI-powered extraction handles complex documents like PDFs and Word files. Regex + mapping rules handle structured formats like CSV and XML.
Extracted data flows directly into storage, workflow logic, dashboards + external systems - no separate parsing tool, no staging step, no manual data entry.
Inbound triggers include:
-
PDF documents
-
Word documents (.docx)
-
CSV + spreadsheet files
-
XML files
-
Email + attachments
-
Images (via AI extraction)
-
JSON files with nested structures
Outbound triggers include:
-
Structured JSON payloads for workflow processing
-
Clean field values for storage in Primary/Secondary Tables
-
Extracted data for dashboards, AI models or external APIs

KEY CAPABILITIES
What File Parsing gives you.
AI-powered document extraction
Pass PDFs, Word documents + text files to an LLM connector node (OpenAI, Claude, Gemini + others) for structured data extraction. The AI reads the document and returns configured fields as structured JSON - no manual template mapping required.
CSV + structured file parsing
Ingest CSV, XML + JSON files from FTP, SFTP or S3 and parse field values into workflow payloads automatically. Configure column mapping, data type conversion + validation rules to ensure structured output regardless of input format variation.
Email + attachment processing
Process emails and extract data from attached files automatically on receipt. Structured content from email bodies or attachments flows into workflows - useful for invoice processing, report ingestion + document-triggered automation.
Regex + validation rules
Apply regex patterns, field validation rules + mapping logic to incoming file data. Validate field formats on ingestion, flag anomalies + reject or flag records failing quality checks before parsed data reaches storage or downstream processing.
Extract JSON Key node
Extract specific values from nested JSON structures within a workflow. Supports deep nesting, wildcard key selection + array handling. Used when ingested files contain complex JSON with required data buried in nested objects or arrays.
Merged file + real-time data pipelines
Combine parsed file data with real-time streams in the same workflow. Merge uploaded file data with time-series data, API responses or Primary Table records - for example, combining a daily CSV report with live sensor readings for unified analysis.
HOW IT CONNECTS: EXPLAINER
Where File Parsing fits in the Rayven Platform stack.
File parsing nodes sit in the Data Layer, processing file content after ingestion from the Integration Layer.
-
Files arrive via FTP, SFTP, S3 or manual upload from the Integration Layer.
-
Parsing nodes extract, validate + structure file content within the workflow.
-
Structured output writes to MySQL or Cassandra for storage.
-
The Execution Layer uses parsed data for workflow logic, AI processing + automated actions.
-
The Presentation Layer surfaces parsed data in dashboards + reports.
USE CASES
How File Parsing gets used.
Automated invoice processing
Supplier invoices arrive as PDFs in an SFTP folder. A Rayven workflow picks up each file, passes it to a Claude node for structured extraction of supplier name, invoice number, line items + total. Extracted fields write to a Secondary Table and trigger an approval workflow - no manual data entry required.

Daily report ingestion for a retail BI platform
Store managers upload daily sales CSV reports to an S3 bucket. A Rayven workflow ingests each file, maps columns to a standard schema, aggregates by store Label + writes results to a Primary Table. A live dashboard surfaces consolidated sales data within 30 seconds of upload.

Partner building a document processing pipeline for a legal firm
An MSP uses Rayven's AI document extraction to build a contract review pipeline for a legal client. Contracts uploaded to a portal are parsed by a Claude node, key clauses extracted as structured fields + flagged for review if specific conditions are met - delivered as the partner's own product.

Rayven File Parsing FAQs:
PDFs, Word documents (.docx), CSV, XML, JSON (including nested structures), plain text, email + attachments. For image-based documents, AI extraction via an LLM connector handles content-based parsing.
No. CSV, XML + JSON parsing is configured using node settings within the workflow builder. For complex document parsing, the AI connector node handles extraction without custom code.
A file ingestion node reads a document from an FTP or S3 path. The file is passed to an LLM connector node with a configured extraction prompt. The AI returns structured JSON containing the specified fields, which flow into downstream workflow nodes.
Yes. Rayven can process incoming email content and attachments, extracting structured data from both. This enables automated workflows triggered by email receipt - useful for invoice processing, report ingestion + document-based approvals.
Validation rules are configured within transformation nodes - applying regex patterns, data type checks + range validations to extracted fields. Records failing validation can be flagged, routed to a review queue or rejected before reaching storage.
Yes. The Combine Data node merges parsed file outputs with real-time workflow data in the same pipeline. A daily CSV upload can be combined with live sensor readings or API data to create a unified dataset for analysis.
The Extract JSON Key node extracts specific values from nested JSON payloads using configurable key paths. Supports deep nesting, wildcard selection + array handling. The JavaScript node provides full custom parsing logic for complex extraction requirements.
Yes. Multiple file ingestion nodes can run concurrently across different workflow instances. Per-UID processing means each file's workflow executes independently, scaling to high ingestion volumes without queue contention.
Image-based documents (scanned PDFs, image files) can be processed via the AI connector node using a vision-capable LLM. The model reads image content and extracts specified fields as structured text.
Yes. Parsed data flows into the Execution Layer immediately after extraction. A Conditional Filter or Rule Builder evaluates extracted field values and triggers alerts, API calls, database writes or downstream workflow steps based on parsed content.
/Website%202026/Industries%202026/Engineering/Engineering-Leader-vertical%20WebP.webp?width=388&height=552&name=Engineering-Leader-vertical%20WebP.webp)
Also in the Data Layer:
Unified Data Tables
Structured Primary + Secondary Tables for entity records, metadata + relational data alongside Cassandra time-series.
Data Management
Configure retention policies, inspect workflow payloads, export raw data + manage data lifecycle across the platform.
Data Transformation
JavaScript, Advanced Function + Combine Data nodes for schema mapping, enrichment + normalisation within workflow processing chains.
File Parsing
Ingest + parse files from FTP, S3 + manual uploads into structured, real-time data available to workflows and AI models.
Calculation + Aggregation
Sum, average, count + aggregate across UID or Label over any defined time window - at the point of processing.
AI Models + Training
Train Python ML models on Cassandra time-series data + deploy predictions as real-time workflow steps.
SQL + Cassandra Data Storage
Hybrid storage architecture - MySQL for relational records, Cassandra for time-series + event data.
Join the Shift
Discover the easy way to do something new.
Book a demo with our team and we'll show you exactly how Rayven can work for your environment.