Concepts
PRVOD is built around one idea: the walkthrough should be generated from the same source the reviewer reads — the diff, the metadata, and the surrounding code. Anything else creates a second source of truth that can drift.
This page covers what the pipeline does, why it does it that way, and where the trade-offs are.
The pipeline
Section titled “The pipeline”PR diff + metadata │ ▼ 1. Parse diff score files by importance, classify change type 2. Write script (LLM) 4–16 scenes via Claude or Gemini, validated by Zod 2b. Judge script LLM evaluates coverage + narration, revises failing parts 3. Synthesize speech Google TTS or model-native voice generates per-scene audio 4. Build scene clips syntax-highlighted code from the diff, sized to narration 5. Compose final video FFmpeg (default) or Remotion renders clips + captions at 1080p/30fps 6. Upload & deliver signed URL returned to the caller1. Parse diff
Section titled “1. Parse diff”PR content (title, description, diff, linked issues, milestone, branch names) is parsed into a PRContext. The diff parser scores each file by importance — lines changed, semantic weight, presence in critical paths — and classifies the overall change type (feature, refactor, fix, docs, infra).
This step is deterministic and cheap. The score it produces drives which files get dedicated scenes versus grouped summaries.
2. Write the script
Section titled “2. Write the script”A ScriptWriter generates a structured script: 4–16 scenes, each with narration, file references, and code highlights. The script is a Zod-validated JSON object, not free-form prose — schema mismatches fail loudly instead of producing a malformed video.
Provider options (set via SCRIPT_WRITER):
| Provider | Use when |
|---|---|
claude-sdk | Default. Anthropic API, structured output. |
claude-cli | Local dev with claude CLI installed, uses your existing auth. |
gemini-sdk | Google AI Studio key. Larger token budgets, mandatory thinking budget. |
gemini-cli | Local Gemini CLI auth. |
codex-cli | OpenAI Codex CLI. |
mock | No API calls; deterministic stub for tests. |
2b. Judge the script (Self-Refine)
Section titled “2b. Judge the script (Self-Refine)”After generation, a second LLM call evaluates the script against two criteria: coverage (does it adequately describe the diff?) and narration quality (is it clear and coherent?). Failing parts are revised in a follow-up pass — the Self-Refine pattern.
The judge step is optional and degrades gracefully. Set SKIP_JUDGE=true to bypass it (faster, cheaper, lower quality).
3. Synthesize speech
Section titled “3. Synthesize speech”Each scene’s narration is synthesized to audio. Two paths:
- Google Cloud TTS (default for production) — Neural2 / WaveNet / Chirp HD / Chirp3 HD voices with SSML word-timing marks for accurate caption sync.
- Built-in / model-native TTS (
USE_BUILTIN_TTS=true) — no Google Cloud key, lower fidelity, fine for local dev.
Per-attempt timeouts and retry backoff are configurable via GOOGLE_TTS_* env vars.
4. Build scene clips
Section titled “4. Build scene clips”For each scene, the corresponding code snippet is syntax-highlighted (via Shiki), rasterized to PNG (via Sharp), and sized to the narration duration. This is the “code on screen” you see during playback.
5. Compose the final video
Section titled “5. Compose the final video”Two compositors:
- FFmpeg (default) — filter graph at 1080p / 30fps. No cold start, fast, deterministic.
- Remotion (
VIDEO_COMPOSITOR=remotion) — React-based composition, more flexible, 20–40s cold start for the webpack bundle. Pre-baked Chrome Headless Shell in the Docker image.
6. Upload and deliver
Section titled “6. Upload and deliver”The finished MP4 is uploaded to the configured storage (Cloudflare R2, AWS S3, or local filesystem). The API returns a signed URL valid for SIGNED_URL_EXPIRY_HOURS (default 4 hours). GET /api/jobs/:id regenerates a fresh URL on every poll.
Checkpoints and retry
Section titled “Checkpoints and retry”After expensive steps (script generation, scene compositing), the pipeline writes a PipelineCheckpoint. POST /api/jobs/:id/retry resumes from the latest checkpoint instead of rerunning the entire pipeline.
A single annotation switches mode. The pipeline branches at script generation; downstream stages are identical.
| Mode | Duration | When to use |
|---|---|---|
| Standard | 20–120s | Default. Most PRs. |
| Short | 20–60s | Bug fixes, single-file changes, anything you want to share in a Slack thread. |
| Popcorn | 240–320s | Large PRs where a quick overview isn’t enough. Three-act story arc; every changed file mentioned. |
| Deepdive | Same as base mode | Reviewer-style narration — asks questions, surfaces concerns. Opt-in modifier on any of the above. |
The knowledge-base model
Section titled “The knowledge-base model”Each walkthrough lives at a permanent URL: /reviews/[jobId] (server-rendered review page) and /watch/[jobId] (HMAC-signed share player). The videos don’t disappear when the PR closes.
PR by PR, your project accumulates a code-derived archive of how it got to its current state. The archive is structurally aligned with the diff history because each video was generated from the diff itself — there’s no separate authoring step to fall behind.
This is the property that makes PRVOD different from manual screencasts or static docs: drift is not a maintenance problem you have to manage, because there is nothing to maintain.
Architecture
Section titled “Architecture”The codebase follows a 4-layer dependency inversion pattern. Dependencies point inward only.
Layer 1: src/interfaces/ Port definitions (IScriptWriter, ITTSService, IVideoCompositor, ...)Layer 2: src/domain/ Business logic, entities, services (PipelineRunner, VideoOrchestrator)Layer 3: src/infrastructure/ Implementations (Postgres, R2, Google TTS, FFmpeg, Remotion, Claude, Gemini)Layer 4: src/app/api/ Next.js route handlers, CLI entrypointsThe DI container at src/config/container.ts is a lazy singleton with environment-driven wiring. Setting NODE_ENV=test or USE_MOCK_SERVICES=true swaps every binding for an in-memory mock — the full pipeline runs without a database, LLM, or storage backend.
Prompt-injection defense
Section titled “Prompt-injection defense”PR content (titles, descriptions, diffs, linked issues, milestone text, branch names) is untrusted user input that flows into LLM prompts. A 7-layer defense pipeline protects against prompt injection: input preprocessing, pattern scanning, structural prompt architecture, canary tokens, schema validation, output scanning, and content validation.
Details at README.md → Security.
Limitations
Section titled “Limitations”- No job queue. Jobs run as fire-and-forget promises in the Next.js process. For production scale, swap
PipelineRunner.run()for a worker queue (BullMQ, Trigger.dev). - Remotion cold start. First Remotion render takes 20–40s for webpack bundling. The default FFmpeg compositor has no cold start.
- Non-English injection patterns. The prompt-injection guard is English-only.
- Signed URLs expire. Default 4 hours. Links shared directly will stop working after expiry;
GET /api/jobs/:idregenerates them.
The full list lives at README.md → Known Limitations.