Skip to content

Concepts

PRVOD is built around one idea: the walkthrough should be generated from the same source the reviewer reads — the diff, the metadata, and the surrounding code. Anything else creates a second source of truth that can drift.

This page covers what the pipeline does, why it does it that way, and where the trade-offs are.

PR diff + metadata
1. Parse diff score files by importance, classify change type
2. Write script (LLM) 4–16 scenes via Claude or Gemini, validated by Zod
2b. Judge script LLM evaluates coverage + narration, revises failing parts
3. Synthesize speech Google TTS or model-native voice generates per-scene audio
4. Build scene clips syntax-highlighted code from the diff, sized to narration
5. Compose final video FFmpeg (default) or Remotion renders clips + captions at 1080p/30fps
6. Upload & deliver signed URL returned to the caller

PR content (title, description, diff, linked issues, milestone, branch names) is parsed into a PRContext. The diff parser scores each file by importance — lines changed, semantic weight, presence in critical paths — and classifies the overall change type (feature, refactor, fix, docs, infra).

This step is deterministic and cheap. The score it produces drives which files get dedicated scenes versus grouped summaries.

A ScriptWriter generates a structured script: 4–16 scenes, each with narration, file references, and code highlights. The script is a Zod-validated JSON object, not free-form prose — schema mismatches fail loudly instead of producing a malformed video.

Provider options (set via SCRIPT_WRITER):

ProviderUse when
claude-sdkDefault. Anthropic API, structured output.
claude-cliLocal dev with claude CLI installed, uses your existing auth.
gemini-sdkGoogle AI Studio key. Larger token budgets, mandatory thinking budget.
gemini-cliLocal Gemini CLI auth.
codex-cliOpenAI Codex CLI.
mockNo API calls; deterministic stub for tests.

After generation, a second LLM call evaluates the script against two criteria: coverage (does it adequately describe the diff?) and narration quality (is it clear and coherent?). Failing parts are revised in a follow-up pass — the Self-Refine pattern.

The judge step is optional and degrades gracefully. Set SKIP_JUDGE=true to bypass it (faster, cheaper, lower quality).

Each scene’s narration is synthesized to audio. Two paths:

  • Google Cloud TTS (default for production) — Neural2 / WaveNet / Chirp HD / Chirp3 HD voices with SSML word-timing marks for accurate caption sync.
  • Built-in / model-native TTS (USE_BUILTIN_TTS=true) — no Google Cloud key, lower fidelity, fine for local dev.

Per-attempt timeouts and retry backoff are configurable via GOOGLE_TTS_* env vars.

For each scene, the corresponding code snippet is syntax-highlighted (via Shiki), rasterized to PNG (via Sharp), and sized to the narration duration. This is the “code on screen” you see during playback.

Two compositors:

  • FFmpeg (default) — filter graph at 1080p / 30fps. No cold start, fast, deterministic.
  • Remotion (VIDEO_COMPOSITOR=remotion) — React-based composition, more flexible, 20–40s cold start for the webpack bundle. Pre-baked Chrome Headless Shell in the Docker image.

The finished MP4 is uploaded to the configured storage (Cloudflare R2, AWS S3, or local filesystem). The API returns a signed URL valid for SIGNED_URL_EXPIRY_HOURS (default 4 hours). GET /api/jobs/:id regenerates a fresh URL on every poll.

After expensive steps (script generation, scene compositing), the pipeline writes a PipelineCheckpoint. POST /api/jobs/:id/retry resumes from the latest checkpoint instead of rerunning the entire pipeline.

A single annotation switches mode. The pipeline branches at script generation; downstream stages are identical.

ModeDurationWhen to use
Standard20–120sDefault. Most PRs.
Short20–60sBug fixes, single-file changes, anything you want to share in a Slack thread.
Popcorn240–320sLarge PRs where a quick overview isn’t enough. Three-act story arc; every changed file mentioned.
DeepdiveSame as base modeReviewer-style narration — asks questions, surfaces concerns. Opt-in modifier on any of the above.

Each walkthrough lives at a permanent URL: /reviews/[jobId] (server-rendered review page) and /watch/[jobId] (HMAC-signed share player). The videos don’t disappear when the PR closes.

PR by PR, your project accumulates a code-derived archive of how it got to its current state. The archive is structurally aligned with the diff history because each video was generated from the diff itself — there’s no separate authoring step to fall behind.

This is the property that makes PRVOD different from manual screencasts or static docs: drift is not a maintenance problem you have to manage, because there is nothing to maintain.

The codebase follows a 4-layer dependency inversion pattern. Dependencies point inward only.

Layer 1: src/interfaces/ Port definitions (IScriptWriter, ITTSService, IVideoCompositor, ...)
Layer 2: src/domain/ Business logic, entities, services (PipelineRunner, VideoOrchestrator)
Layer 3: src/infrastructure/ Implementations (Postgres, R2, Google TTS, FFmpeg, Remotion, Claude, Gemini)
Layer 4: src/app/api/ Next.js route handlers, CLI entrypoints

The DI container at src/config/container.ts is a lazy singleton with environment-driven wiring. Setting NODE_ENV=test or USE_MOCK_SERVICES=true swaps every binding for an in-memory mock — the full pipeline runs without a database, LLM, or storage backend.

PR content (titles, descriptions, diffs, linked issues, milestone text, branch names) is untrusted user input that flows into LLM prompts. A 7-layer defense pipeline protects against prompt injection: input preprocessing, pattern scanning, structural prompt architecture, canary tokens, schema validation, output scanning, and content validation.

Details at README.md → Security.

  • No job queue. Jobs run as fire-and-forget promises in the Next.js process. For production scale, swap PipelineRunner.run() for a worker queue (BullMQ, Trigger.dev).
  • Remotion cold start. First Remotion render takes 20–40s for webpack bundling. The default FFmpeg compositor has no cold start.
  • Non-English injection patterns. The prompt-injection guard is English-only.
  • Signed URLs expire. Default 4 hours. Links shared directly will stop working after expiry; GET /api/jobs/:id regenerates them.

The full list lives at README.md → Known Limitations.