How to repurpose content with AI (the manual workflow Kompozy automates, 2026)
Repurpose long-form content into multi-platform short-form using ChatGPT, Claude, ElevenLabs, HeyGen, and CapCut. Includes the full manual chain — the same chain Kompozy automates end-to-end.
Last verified 2026-05-22
AI-assisted content repurposing in 2026 is a chain of 4-6 tools: transcription (Whisper, AssemblyAI), LLM extraction (ChatGPT, Claude), voice cloning or avatar (ElevenLabs, HeyGen), video editing (CapCut, Descript), captioning (Submagic, CapCut auto-captions), and scheduling (Buffer, Later, Metricool). Each tool does one thing well; chaining them by hand produces high-quality output but takes 4-8 hours per source piece.
This tutorial walks the full manual chain. We are honest upfront: this is the same workflow Kompozy automates end-to-end. If you are evaluating Kompozy, this guide shows you what it is doing under the hood. If you are running the workflow manually, the steps below are the exact sequence to follow.
The deliberate framing: every tool below is best-in-class for its specific step. Kompozy does not replace them; it orchestrates them. For a creator publishing 1-2 sources per month, the manual chain is fine. For a creator producing 4+ sources per month, the chain becomes the bottleneck — that is when automation matters.
The steps
Step 1 — Transcribe the source. Transcribe your long-form source (podcast, webinar, recorded video) using Whisper (whisper.ai or a self-hosted instance — open-source, free, near-human accuracy), AssemblyAI ($0.65/hour, high accuracy with speaker diarization), or Descript (transcription + editing in one tool, ~$15-30/mo). Output: timestamped transcript. Whisper Large-v3 is the practical free benchmark; AssemblyAI is the practical paid benchmark.
Step 2 — Extract clip candidates with an LLM. Paste the transcript into ChatGPT (GPT-5 or Claude 3.5/4) with a prompt like: "Below is a transcript of a 60-minute podcast. Identify 8-15 self-contained moments that would work as 30-60 second standalone short-form videos. For each, output: (1) start and end timestamps, (2) the exact spoken text, (3) a 1-sentence summary of why it works, (4) a suggested hook to add as text overlay." The LLM scans the transcript and surfaces moments with retention shape (curiosity gap, specific outcome, contrarian frame, story arc).
Step 3 — Cut the clips in CapCut or Descript. For each LLM-identified clip, open the source video, navigate to the timestamp, and cut a tight 30-60 second segment. Descript lets you cut by editing the transcript directly (delete words = delete corresponding video) which is faster than scrubbing timelines. Export each clip as a separate MP4 at the source resolution.
Step 4 — Generate captions with Submagic or CapCut. Drop each clip into Submagic (submagic.co — $25-45/mo) or CapCut Auto Captions (free). Both run their own ASR and produce styled captions matching the spoken track. Submagic has better aesthetic presets out of the box; CapCut requires more customization but is free. Apply a consistent caption preset across all clips so the visual brand stays unified.
Step 5 — Optionally add a face / avatar intro. For brand consistency, add a 3-5 second avatar intro to each clip using HeyGen (heygen.com — $24-72/mo) or your own recorded face footage. HeyGen Avatar IV produces a talking avatar from a single photo + script; the intro establishes brand recognition before the clip's actual content. Optional but high-impact for accounts building face-led brands.
Step 6 — Add hooks and resize per platform. In CapCut, add a hook overlay (text at frame 1-3 seconds following the framework in write-viral-hooks) and resize the clip per destination platform: 9:16 for Reels / TikTok / Shorts, 4:5 for IG feed, 1:1 for X / LinkedIn feed. Export each variant as a separate MP4.
Step 7 — Write per-platform captions with an LLM. Paste each clip's transcript back into ChatGPT/Claude with: "Write a TikTok caption, Instagram caption, YouTube Shorts description, X post, and LinkedIn post for this clip. Apply each platform's character limit and hashtag conventions." Output: 5 platform-specific captions per clip ready to drop into your scheduler.
Step 8 — Schedule across platforms in Buffer or Later. Upload each clip variant to your scheduler with the matching caption, set the publish time per platform (stagger by 5-15 minutes — see cross-post-to-all-platforms), and queue. Most schedulers handle the cross-platform publishing in one step once each variant is loaded with its platform-specific assets.
Common gotchas
Transcripts with poor audio quality produce LLM extractions that miss the best moments. Audio quality at the source is the upstream lever.
LLM extraction is non-deterministic — running the same transcript twice produces slightly different clip lists. Run twice if you want to spot moments missed in the first pass.
CapCut auto-captions occasionally hallucinate words or mistranscribe domain-specific terms. Always proofread before publishing.
Voice cloning (ElevenLabs) and avatar (HeyGen) intros that do not match your real voice or face create brand dissonance. Use them deliberately, not because the tool is cool.
The chain compounds time: 4-8 hours per source piece manually. After 3-4 sources, the chain becomes the bottleneck — automation makes economic sense.
Each tool has its own monthly fee. The full chain at full fidelity (AssemblyAI + ChatGPT Plus + Submagic + HeyGen + Buffer) totals ~$80-150/mo — comparable to a content automation tool.
Where Kompozy fits
Kompozy is the automation of the chain above — same tools, same steps, orchestrated end-to-end so you drop a source in once and the output ships to all platforms. Whisper + LLM extraction + caption rendering + avatar wrap + per-platform resize + scheduling — same chain, executed by the engine instead of by you.
The honest framing: if you are running this chain manually 1-2 times per month, do not buy Kompozy — the chain works fine at that volume and the tool stack is ~$80/mo all in. If you are running it 4+ times per month and the chain is consuming a real chunk of your week, Kompozy collapses it into a single render run. Pro tier ($299/mo for 18,000 credits) covers ~4-6 podcast or webinar sources per month at full chain depth including HeyGen avatar wraps and 6-platform fanout.
For BYO-key users (Founding $39/mo, signups close 2026-08-31), you bring your own OpenAI / Anthropic / HeyGen / ElevenLabs keys and Kompozy is the orchestration layer at a flat monthly cost.
Frequently asked questions
How long does the manual AI repurposing chain take per source?
4-8 hours for a 60-minute podcast source, depending on familiarity with each tool. Steps 1-2 (transcribe + extract) take 30-60 min combined; steps 3-6 (cut + caption + intro + resize) take 2-4 hours; steps 7-8 (write + schedule) take 1-2 hours.
Which step takes the longest?
Step 4 (captioning) and step 6 (per-platform resize + hook overlay) are the most time-intensive because they require touching every individual clip. Steps 1, 2, 7 are bulk operations that scale better.
Can I skip steps?
You can skip the avatar intro (Step 5) and the per-platform variants (consolidate to one 9:16 cut for all platforms) without major quality loss. Skipping captions (Step 4) tanks watch time because 70%+ of short-form viewing is sound-off — do not skip.
What is the cheapest version of this chain?
Whisper (free local install) + ChatGPT free tier + CapCut (free) + Buffer free tier. Total cost: $0/mo, with the trade-off of longer transcription times and lower-quality LLM extractions. Works for low-volume creators.
When does Kompozy make economic sense vs running this manually?
At ~4+ sources per month or when the chain time exceeds 16 hours/month. Below that, manual is fine. Above that, the tool stack alone runs $80-150/mo and the time cost is the binding constraint.
Can I use the same chain for blog posts and newsletters?
Yes — Step 2 prompts the LLM to extract a blog post outline and newsletter draft from the same transcript. The chain extends naturally to written derivatives once the transcript exists.