← home · /hire

Case study

YouTube Production System — automated niche video factory

Personal · May 2024 – present · Python · Django · LangChain · FFmpeg · MoviePy · pgvector · Go · React · Docker

[01] batu@batu0:~/case-studies/yps

Built from scratch over roughly 600 hours as a solo project: a fully automated YouTube channel factory that takes raw signals from Telegram clusters, RSS feeds, and a vector-indexed book corpus and turns them into finished, published videos — without a human in the loop after the niche and editorial policy are defined.

Three channels run live across completely different content categories. Each has its own editorial voice, source mix, and visual aesthetic, all generated by the same underlying pipeline.

The pipeline

A full production run takes 10–15 minutes end to end, compared to 7–13 hours for a manually produced video of equivalent quality.

Ingestion — Telegram messages are fetched on a cron (every two hours), clustered by topic, and deduplicated. RSS feeds and direct research endpoints run in parallel.

Research agent — A LangChain agent orchestrates web research via Tavily, cross-referencing ingested content against the cluster to build a factual brief for the script generator.

Script generation — A critique loop produces the final script. Five distinct editorial voices — each with its own persona, tone, and structural constraints — are defined as Pydantic contracts; the loop iterates until the output satisfies all constraints. No voice bleeds into another.

TTS synthesis — A voice cloning and synthesis layer converts the script to audio. OpenVoice and EmotiVoice run self-hosted for custom voice profiles.

Visual assembly — FFmpeg and MoviePy compose the video: dynamic panning over images, captions burned in via Whisper-aligned timings, intro/outro stings, Remotion-driven motion graphics for the news format.

Upload — A Go service handles YouTube API auth, metadata templating, and scheduled publish. It runs alongside a Flask API gateway to separate upload concerns from the pipeline.

RAG corpus

The literary-charizard channel cannot be replicated by a competitor simply by prompting an LLM harder. Its scripts are grounded in a proprietary vector corpus — book passages, philosophical texts, 19th-century fiction — indexed with pgvector and served by a dedicated microservice (MCC). Each run fetches up to 8 similarity-matched passages, which are injected directly into the script generation context. The result reads like a short film adapted from source material, not like a generic AI voiceover.

Live channels

War news — Telegram channel clusters feed a breaking-news editorial voice. Produces vertical 1080×1920 Shorts. Wall time: ~1.5 minutes per video.

Stoic meditations — Stoic philosophy passages, atmospheric visuals, calm voice profile. Script arc enforces a reflection structure; outputs run as landscape videos.

Literary-charizard — RAG-driven narrative fiction. Seed texts (S01–S20) plus live Telegram content select the run’s theme; the pgvector search provides grounding passages; the script generator builds a cinematic arc around them.

Engineering

15+ containerised microservices orchestrated via Docker Compose with NVIDIA CUDA GPU passthrough for local rendering. The backend is Django REST Framework serving a React 18 SPA — the operator dashboard for channel config, pipeline triggers, and output review. The Go upload service and Flask API gateway run as separate services.

Approximately 600 commits across the monorepo since May 2024.