Architecture Comparison

Current

Deterministic Pipeline

Sequential processing. Text extraction → heuristic semantic analysis → rule-based HTML rendering. LLM is optional, only for ambiguous nodes.

📄 Upload .pptx or .pdf main.py

File validation, job creation, stored in memory dict

↓

🔍 Parse File parsers.py

python-pptx for PPTX, pdfplumber for PDF. Extracts text, tables, images as NormalizedNode objects with bounding boxes and font sizes.

↓

📐 Reconstruct Reading Order semantic.py

Column-aware spatial sorting. Uses bounding box positions to guess left-right / top-down reading order.

↓

🧠 Infer Semantics Sequential semantic.py

Heuristic role assignment: font size → heading, bullet chars → list items, bounding box patterns → tables. Assigns SemanticIntent (role + confidence) to each node.

↓

🤖 Groq LLM (Ambiguity Only) Optional groq_client.py

Text-only. Only invoked for low-confidence nodes (<0.6). Classifies ambiguous nodes into h2/h3/p/li/table/img/input. Single batch call, no vision.

↓

🏗️ Render HTML html_renderer.py

Iterates NormalizedNodes, maps SemanticIntent roles → HTML tags. Builds full document with nav, skip links, CSS, ARIA landmarks. Rule-based, no intelligence.

↓

✅ Validate A11y validator.py

Checks heading hierarchy, empty alt text, missing table headers, form labels. Regex-based HTML checks.

↓

💾 Write Outputs converter.py

document.html + manifest.json saved to results/{job_id}/

Proposed

VLM-Powered Pipeline

Per-page parallel processing. Page screenshot + text extraction → DeepSeek VLM generates semantic HTML directly → deterministic stitch. LLM is the core, not a sidecar.

📄 Upload .pptx or .pdf main.py

Same job lifecycle. Unchanged API contract.

↓

✂️ Split into Pages New page_splitter.py

PDF → N individual pages. PPTX → convert to PDF first, then split. Each page becomes an independent work unit.

↓

⚡ Per-Page Processing (Parallel) Concurrent

All pages processed simultaneously with semaphore rate limiting (5-10 max concurrent)

📸 Render Page Screenshot New page_splitter.py

Rasterize page to PNG at ~150-200 DPI using pdf2image / poppler

↓

📝 Extract Text Layer New page_splitter.py

pdfplumber or PyMuPDF extracts raw text per page. This is the "authoritative" text — VLM uses it as ground truth, screenshot for structure only.

↓

👁️ DeepSeek VLM Call Core deepseek_client.py

Vision + text. Receives: page PNG + extracted text + system prompt (output contract). Returns: single <section> of semantic HTML. Handles layout, hierarchy, alt text, reading order — all in one shot.

↓

🧩 Deterministic Stitch New stitcher.py

Pure code, no LLM. Concatenates sections in order, wraps in HTML shell, builds <nav> from <h2> tags, injects IDs, strips duplicate headers/footers.

↓

✅ Validate A11y validator.py

Updated checks: data-page on every section, alt on every img, heading hierarchy, output contract compliance.

↓

💾 Write Outputs converter.py

Same output format: document.html + manifest.json

File-by-File Impact

app/main.py

API routes, job lifecycle

→

app/main.py

Minor changes — converter call becomes async

app/config.py

Groq settings, file limits

→

app/config.py

Replace Groq settings with DeepSeek API key, concurrency limits, DPI setting

app/services/parsers.py

350 lines — PptxParser, PdfParser, NormalizedNode extraction

→

app/services/page_splitter.py

Split PDF into pages, render PNGs, extract text per page. Simpler — no node/bbox modeling

app/services/semantic.py

148 lines — heuristic role inference, reading order, Groq disambiguation

→

Deleted

VLM handles all semantic decisions directly

app/services/groq_client.py

70 lines — text-only Groq wrapper for ambiguous nodes

→

app/services/deepseek_client.py

Vision LLM client — sends image + text, receives <section> HTML. Retry logic, rate limiting

app/services/html_renderer.py

224 lines — node-by-node HTML rendering, CSS, document shell

→

app/services/stitcher.py

Concatenate VLM sections, wrap in HTML doc shell, build nav, inject IDs. Much simpler

app/services/converter.py

Orchestrates parse → semantic → render → validate

→

app/services/converter.py

Orchestrates split → parallel VLM calls → stitch → validate. Now async

app/services/validator.py

Checks heading hierarchy, alt text, table headers

→

app/services/validator.py

Updated checks: data-page attrs, output contract compliance, no inline styles

app/models/schemas.py

NormalizedNode, BBox, SemanticIntent, SlideDocument

→

app/models/schemas.py

Simplified — drop NormalizedNode/BBox/SemanticIntent. Add PageResult (text, image_path, html_section)

What Changes

LLM goes from optional sidecar → core of the pipeline

Text-only Groq → vision-capable DeepSeek VLM

Sequential whole-doc processing → parallel per-page processing

Heuristic semantic analysis eliminated entirely

Rule-based HTML rendering eliminated — VLM generates HTML directly

NormalizedNode/BBox/SemanticIntent data model no longer needed

converter.py becomes async to support parallelism

What Stays

FastAPI app structure and API contract unchanged

Job lifecycle (create → poll → result) unchanged

Output format: document.html + manifest.json

A11y validation pass (updated, not replaced)

Frontend (static/index.html) unchanged

File upload validation logic unchanged

PPTX support (convert to PDF first, then same pipeline)

Accessibility Converter — Architecture

File-by-File Impact

What Changes

What Stays