The startup's model scores a soundtrack directly from a video — no text prompt — and lands on fal.ai with a commercial-licensing story built on Shutterstock's catalog.
2026-06-22 · by Moe Ameen
Sonilo, a Menlo Park startup led by CEO Shawn Song, made its video-to-music AI model available on fal.ai, the generative-media infrastructure platform, in June 2026. The move puts Sonilo's soundtrack generation in front of developers and creative teams building AI video products, editing tools, and creator platforms, who can now call it through fal.ai's API alongside the rest of their pipeline. Through fal.ai the model accepts videos up to about 600 seconds.
Sonilo's pitch is that you do not write a prompt at all. You hand it a clip and the model reads the footage — pacing, motion, structure, and emotional arc — then composes an original soundtrack generated to the video's exact length, resolving on a real musical ending instead of a hard cut or loop. It returns several variations per clip, preserves the original speech in the source footage, and delivers the music as a separate audio track so you can balance it against dialogue. A companion text-to-music model offers segment-level style and mood controls for creators who want to start from a description. "AI video creation is moving faster than ever, but music is still often treated as something creators have to search for," Song said.
The differentiator is licensing. Sonilo says its models are trained on professionally licensed content, including Shutterstock's music catalog, with the musicians involved compensated, and it positions output on its paid plans as production-ready and cleared for commercial use. The fal.ai availability follows Sonilo's earlier debut on ComfyUI as a native node and its broader launch, with an API, in early May 2026. Specifics of model versions, limits, and plan terms change as the product ships, so treat any exact figure as a snapshot and confirm against Sonilo's site.
The licensing news removes a publishing blocker, but it does not get the video posted. A rights-cleared soundtrack on a clip is one upload — the rest of the job is captions, per-platform sizing, supporting posts, and actually scheduling everything. That is where Kompozy comes in. Score the clip in Sonilo for a commercially safe track, bring the export into Kompozy, and it burns in branded captions, reframes the clip for each destination's aspect ratio, and fans it into a Clipped Short, a quote card, a caption, and a text thread in your voice — then schedules and publishes the set across all nine platforms plus email and blog from one queue. Sonilo clears the music; Kompozy clears the distribution.
There is also a news play here, today. "Licensed AI music for video" is exactly the kind of timely, high-intent topic creators in your niche are searching this week. Drop your take into Kompozy as a source and it fans one point of view into a blog post explaining what the licensed-music shift means, a carousel breaking down how video-to-music works, short captioned clips, and platform-native posts — then schedules and publishes them across your channels. Being early and clear on a story like this is how a single take becomes a week of content.
Its video-to-music AI model — a tool that reads a clip's pacing and emotional arc and composes an original soundtrack matched to the exact video length, with no text prompt required. On fal.ai it accepts videos up to about 600 seconds, making it embeddable in developers' AI video pipelines.
On its paid plans, Sonilo positions output as production-ready and cleared for commercial use, and it trains on professionally licensed content including Shutterstock's catalog with musicians compensated. The free tier does not include commercial-use rights, so use a paid plan for monetized or branded video.
Sonilo handles the soundtrack, not the publishing. Bring the scored export into Kompozy to add branded captions, reframe it per platform, fan it into supporting posts in your voice, and schedule and publish across TikTok, Reels, YouTube Shorts, X, LinkedIn, and more from one queue.