// HOW-TO · EDITING

How to add captions to a video (auto + manual methods, 2026)

Add captions to videos using CapCut, Submagic, Veed, or OpenAI Whisper. Covers burned-in captions, SRT export, formatting for Reels and TikTok, and the manual SRT workflow.

Last verified 2026-05-22

85% of short-form video is watched with sound off. Captions are not optional — they are the difference between a viewer watching three seconds and watching to the end. The good news: in 2026, every major editor has decent auto-captions, and the standalone caption tools (Submagic, CapCut Captions, Veed) produce broadcast-quality output in under a minute per video.

There are three caption-attachment models: (1) burned-in (the text is part of the video pixels and cannot be turned off), (2) uploaded SRT file (viewers can toggle captions on and off), and (3) platform-generated (TikTok, Reels, and YouTube auto-caption on upload). For short-form, burned-in is the dominant pattern because it gives you full styling control and the captions cannot be turned off by accident.

This guide covers the auto-caption tools, the manual SRT workflow for high-stakes content, and the formatting that actually retains viewers.

The steps

  1. Pick an auto-caption tool. CapCut Captions (free, inside CapCut) is the no-brainer if you already edit in CapCut. Submagic ($16-29/mo) is the standalone leader with the best caption animation presets — the bouncing, color-changing word-by-word style most viral shorts use. Veed ($30/mo) handles longer-form well. Descript bundles captions with the rest of its editing flow. For technical users: OpenAI Whisper (free, runs locally or via API) produces the most accurate transcripts but you assemble the captions yourself.
  2. Import your video into the caption tool. Upload the MP4. Most tools accept up to 1-4GB depending on the plan tier. Free tiers often cap at 10-15 minutes per video — fine for short-form, limiting for podcasts.
  3. Generate captions and review for accuracy. Hit the auto-caption button. Whisper-based tools (Submagic, Descript, Veed, CapCut's newer engine) typically hit 95%+ accuracy on clear English audio. Read through the result and fix proper nouns, brand names, and technical terms — auto-captions get these wrong most often. Punctuation may also need cleanup if the source had long pauses or run-on sentences.
  4. Style the captions for the format. For Reels, TikTok, and Shorts: large sans-serif font (Inter, Montserrat, Anton), high contrast (white text with black stroke is the safest), positioned in the middle-third of the frame (not top — covered by username; not bottom — covered by action bar). Most tools offer presets — Submagic's presets are the most copied because they read well at thumbnail size. For long-form (YouTube, podcast video), smaller font in the lower-third with a subtle background plate.
  5. Choose burned-in or SRT export. Burned-in: re-export the video with captions rendered as pixels. Cannot be turned off. Most short-form ships this way. SRT: download the .srt file separately and upload it alongside the video. Viewers can toggle. TikTok, Instagram, YouTube, and LinkedIn all accept SRT uploads. For maximum reach, do both: burned-in for the visual style, plus SRT so the captions are also indexable by search.
  6. For high-stakes content, manually craft the SRT. Auto-captions are good enough for most cases, but for product launches, ads, and hero content where every word matters: export the auto-generated SRT, open it in a text editor or a tool like Subtitle Edit (free), and review every line. Adjust timing where the auto-tool guessed wrong, fix word choices, and break lines at natural pauses. The SRT format is plain text: a sequence number, a timecode range, and the caption text per entry.
  7. Verify on the destination platform. Upload to a draft post or use the platform's preview mode. Captions sometimes render differently on the platform than in your editor — font substitution, contrast issues, or positioning shifts. Fix and re-export if needed.

Common gotchas

  • Whisper-based tools struggle with accents, technical jargon, and overlapping speakers. Always proofread.
  • Burned-in captions on the wrong side of the safe zone (too close to edges) get cropped by platform UI overlays. Keep all text within the middle 80% of the frame width.
  • Animated word-by-word captions look great but can hurt accessibility — viewers using screen readers or who need static captions for cognitive accessibility get worse outcomes. Provide an SRT alongside for accessibility.
  • Auto-captions cap at the length of your audio. If you re-edit the video and cut sections after generating captions, the timecodes will drift — regenerate.
  • TikTok's native captions (the ones the platform generates) are not the same as the captions you burn in. Both can appear at once if you do not disable one — leading to double captions on screen.
  • SRT files use UTF-8 encoding. Saving as ASCII can corrupt non-ASCII characters (emojis, accented letters) and break the upload.

Where Kompozy fits

Captions are first-class output in Kompozy. Every video format the engine produces — Persona Shorts, Marketing Shorts, Faceless Shorts, Clip Shorts, Persona Frames — ships with burned-in captions rendered using libass through the same ffmpeg pipeline. The caption styles match the major presets (Submagic-style word-by-word animation, simple block captions, lower-third broadcast style) and the per-format defaults are tuned for the destination platform.

If you are already using Submagic, Veed, or CapCut purely for captions on top of source video that came from elsewhere, Kompozy collapses the captions step into the generation flow. You upload the source once, the engine cuts the short, generates the captions, applies the overlay, and publishes — no second tool. The Creator tier ($49/mo for 2,500 credits) handles roughly 20-30 captioned shorts per month before you would consider upgrading.

Frequently asked questions

Submagic vs CapCut vs Veed for captions?

Submagic if you want the polished animated caption styles (word-by-word color changes, the viral look). CapCut if you are already editing there and want free unlimited captions. Veed for longer-form video work and team workflows. All three use Whisper-class engines and produce similar transcript accuracy.

Is OpenAI Whisper free?

Yes, if you run it locally — it is open source and works on a modern laptop. The OpenAI Whisper API costs $0.006 per minute of audio. For full transcript generation, that is the cheapest option at any scale.

Should I burn in captions or upload an SRT?

For short-form Reels, TikTok, and Shorts: burn in for styling control. For long-form YouTube, LinkedIn video, and podcasts with video: SRT so viewers can toggle. Best practice: do both when possible — burned-in plus uploaded SRT for accessibility and search.

How accurate are auto-captions?

Whisper-class engines hit 95%+ on clean English audio. Drops to 85-90% on accented speech, technical content, or noisy environments. Always proofread before publishing.

Can I use captions in any language?

Whisper supports 90+ languages including Spanish, Portuguese, French, German, Mandarin, and Arabic. Auto-translation between languages is also a feature in most modern caption tools (Submagic, Veed), though quality varies.

Why are my captions out of sync after editing the video?

Caption timecodes are tied to the audio timeline. Cutting or moving clips after generation drifts the timing. Regenerate captions after any major edits, or use a tool that links captions to the edit timeline.

Do platforms penalize burned-in captions?

No — burned-in captions are the dominant format on every short-form platform. The platforms' own caption auto-generation runs on top of whatever you uploaded.

Related tutorials

← All how-to guides · Start your trial