// AI TOOLS · CLAUDE-REAL-VIDEO

claude-real-video

A local command-line tool that lets Claude — or any LLM — actually watch a video by turning it into scene-change frames plus a transcript.

Last verified · 2026-07-02 · by Moe Ameen

What claude-real-video is

claude-real-video is a free, open-source command-line tool that gives Claude — or any large language model — the ability to actually watch a video. Claude cannot ingest a raw video file; it reasons over text and still images. claude-real-video bridges that gap locally: it takes a video, pulls out the frames that matter, transcribes the audio, and writes a manifest the model can read, so Claude answers questions grounded in what is on screen and in the audio rather than guessing from a title or description.

The interesting engineering is in the frame selection. Instead of sampling one frame every N seconds — which either floods the model with near-identical stills or misses the moment a scene changes — it detects scene changes and captures a frame at each meaningful visual transition, then runs a deduplication pass against a sliding window to drop frames that are too similar to ones it already kept. Audio is handled by preferring any existing subtitle track (SRT/VTT) and falling back to OpenAI's Whisper for transcription when there isn't one. You can optionally preserve the full soundtrack for audio-capable models. Everything runs on your machine; nothing is uploaded to a cloud service.

It is built in Python (3.10+) and leans on standard media tooling — ffmpeg/ffprobe for extraction and yt-dlp for pulling videos from URLs — so it works on a local file or a link from YouTube, Instagram, or TikTok. Flags let you tune scene sensitivity, a frame-count ceiling (150 by default), the dedup window, transcription language, and whether to keep or skip audio. The output is a folder of JPEG frames, a transcript, and a MANIFEST file that ties them together for the model.

Two things to be clear on: it is a developer utility run from the terminal, not a Claude Code plugin or a polished app, and it is a perception layer, not an interpretation layer. It hands Claude the frames and the transcript; the understanding is still Claude's. It is licensed MIT.

What you can make with it

  • A scene-aware frame set from any video — the visually distinct moments, not one still every few seconds
  • A clean transcript pulled from existing subtitles or generated with Whisper
  • A manifest that lets Claude answer questions grounded in what the video actually shows and says
  • A local, private analysis of a long video (a webinar, podcast, or course) without uploading it anywhere
  • An optional preserved soundtrack for models that can reason over audio
  • A structured record of a competitor's or your own video you can mine for hooks, claims, and structure

How Kompozy turns claude-real-video output into content

claude-real-video sits on the input side of the workflow: it makes a video legible to a model so you can understand it. Kompozy sits on the output side: it generates finished content and publishes it. The two chain cleanly. Run claude-real-video on a long asset you already have — a 60-minute webinar, a podcast episode, a course lesson, or a competitor's top-performing video — and Claude reads the frames and transcript and pulls out the strongest moments, the quotable lines, the structure, and the hooks. That analysis is the raw brief. Kompozy is what turns the brief into content: point it at the same source and Clipped Shorts detects the best segments and cuts them to vertical with branded captions, while the ideas Claude surfaced become a Carousel, quote graphics, native Text Posts, a recap Blog Article, and an Email Newsletter — all in your voice through a Persona Brief.

Then Kompozy does the part a terminal analysis never touches: it schedules and publishes that whole set across its nine connected platforms — TikTok, Instagram, Facebook, YouTube, LinkedIn, X, Pinterest, Threads — plus blog and email, from one queue. It also generates net-new formats the analysis has nothing to do with, like HeyGen avatar Persona Shorts and VFX hooks, so the same session that Claude helped you understand becomes a week of on-brand posts. claude-real-video tells you what is in the video; Kompozy makes and ships everything that comes out of it.

  1. Run claude-real-video on your source video — a webinar, podcast, long YouTube upload, or a competitor clip — to get scene frames and a transcript.
  2. Ask Claude to read the manifest and pull the strongest moments, quotes, hooks, and structure.
  3. In Kompozy, use Clipped Shorts on the same long-form video to cut the best segments to vertical with branded captions.
  4. Turn the surfaced ideas into a carousel, quote cards, text posts, a recap blog, and a newsletter — in your voice via the Persona Brief.
  5. Schedule and publish the whole set across TikTok, Reels, Shorts, X, LinkedIn, and more from one Kompozy queue.

Frequently asked questions

What is claude-real-video?

It is a free, open-source command-line tool that lets Claude or any LLM watch a video. It extracts frames at scene changes, deduplicates near-identical ones, transcribes the audio with Whisper (or uses existing subtitles), and writes a manifest — so the model answers grounded in what the video shows and says, not the title. It runs locally and is MIT-licensed.

Can Claude watch a video natively without it?

Not directly. Claude reasons over text and still images, not raw video files or audio waveforms. Tools like claude-real-video do the perception step — pulling out the meaningful frames and a transcript — and hand that to Claude, which then does the interpretation.

How does it choose which frames to keep?

It detects scene changes and captures a frame at each meaningful visual transition rather than sampling at a fixed interval, then runs a deduplication pass against a sliding window to drop frames too similar to ones already kept. A max-frame ceiling (150 by default) caps how many reach the model, and flags let you tune the sensitivity.

What does it need to run?

Python 3.10+ plus ffmpeg/ffprobe for extraction, yt-dlp for URLs, and optionally Whisper for transcription. It works on a local file or a link from YouTube, Instagram, or TikTok, and everything processes on your machine with nothing uploaded to a cloud service.

How do I turn what Claude learns from a video into posts?

claude-real-video is analysis, not publishing. Use the transcript and frames to have Claude surface the best moments, then bring the same source into Kompozy: Clipped Shorts cuts the strongest segments to vertical, and the ideas become a carousel, quote cards, a blog, and a newsletter — published across nine platforms from one queue.

Related tools

  • NotebookLMGoogle's source-grounded research tool that now turns your uploaded documents into TikTok-style vertical video summaries.
  • MunchAI tool that clips long-form video into short, captioned, platform-ready social clips.
  • HeyGenAI avatar video platform that turns a text script into a talking-head video — in 175+ languages.
  • sync.A lip sync and visual dubbing platform that re-syncs any face to new audio in any language.
  • Lip Sync AIFree online AI tool that turns a photo plus an audio clip into a talking avatar with synced lips.

← All AI tools · Get started →