Drop in audio or video. CaptionFit transcribes, and renders a finished video with burned-in captions — 9x16 or 16x9, your font, your color.
Transcribe audio. Sync captions. Render video. No timeline-scrubbing, no manually nudging timestamps, no weird XML.
MP3, WAV, M4A, MP4 — up to 2 hours. Pick a language (or auto-detect) and a caption length, then hit Transcribe.
Got the words already? Paste them — one line per caption — to fix spelling and word grouping. We snap them to the audio.
Pick a font, size, color and aspect ratio (9x16 or 16x9). Hit Render Video — or just download the SRT.
A 4-minute song aligns in about 20 seconds. We run on dedicated GPUs so your queue stays empty.
Paste lyrics or a script and CaptionFit aligns to those exact words. No more "ahh-vuh-tahn-deal" mishears.
Pick a font, dial in size and color, choose 9x16 for Reels or 16x9 for YouTube. Hit Render Video.
Burn-in preview updates live. Edit a line, nudge a timestamp, split or merge — render when it feels right.
I had a 3-minute song and the lyrics in a Notes file. CaptionFit gave me an SRT in 18 seconds and I uploaded the video before my coffee was cold.
The lyric-paste feature is the unlock. Other tools mishear half my band's vocals — pasting the words means it just works.
Replaced an internal Python script we'd been duct-taping for a year. The keyboard shortcuts in the editor are chef's kiss.
When you paste lyrics or a script, alignment is typically within 80–150ms of the spoken word — good enough that you'll rarely need to nudge anything. Audio-only transcription depends on the recording, but you can always paste a correction and re-align.
MP3, WAV, M4A, FLAC, AAC, OGG, plus video formats (MP4, MOV, WebM, MKV). Up to 2 hours per file on paid plans, 10 minutes on Free.
9x16 for Reels, TikTok, and Shorts, and 16x9 for YouTube and the web. Toggle Cover to fit horizontal source into a vertical canvas (or vice versa).
Yep — download a clean SRT any time, even before rendering.
No. Your files are processed and deleted from our servers within 24 hours unless you pin them to a project. We never use your audio or transcripts to train models.
On paid plans you can drop a folder of files at once and we'll align them in parallel. Long files (lectures, audiobooks) are chunked automatically — you still get one clean SRT at the end.
No card required, no setup call, no "book a demo." Free tier covers most one-off projects.