A 0.2B-parameter image inpainting model that claims to match 10B-scale quality at a fraction of the size.
Last verified · 2026-06-22 · by Moe Ameen
Moebius is a lightweight image inpainting framework — a model that fills in or replaces masked regions of an image with content that matches the surrounding scene. Its headline claim is efficiency: it runs on roughly 0.22B parameters (about 226M), which the authors put at under 2% of the size of FLUX.1-Fill-Dev (around 11.9B), while reporting inpainting quality on par with — and in some cases ahead of — those 10B-scale generalist models on natural-image and portrait benchmarks.
The work comes from researchers at Huazhong University of Science and Technology (HUST) and vivo's AI Lab. Its core technical idea is a redesigned diffusion backbone built around a Local-λ Mix Interaction (LλMI) block, which compresses spatial context and global semantic priors into fixed-size linear matrices instead of the quadratic attention larger models rely on, paired with a multi-granularity distillation step that learns from a bigger teacher model. The paper was posted to arXiv in June 2026 (arXiv:2606.19195) and is listed for ECCV 2026.
Speed is the practical payoff. The authors report inference latency around 26 ms per step on a single GPU and an overall runtime more than 15× faster than the 10B-scale models it's compared against. Code and model weights are released under Apache 2.0, with checkpoints fine-tuned for benchmarks such as Places2, CelebA-HQ, and FFHQ.
One thing to keep straight: Moebius is a task-specific inpainting specialist, not a general text-to-image or text-to-video generator. It edits existing images — removing objects, cleaning up backgrounds, reconstructing missing regions — rather than creating a scene from a blank prompt. Benchmark numbers, available checkpoints, and tooling are still moving as the project ships, so treat any exact figure as a snapshot and check the repo for current details.
Moebius gets you a clean, edited still — a product shot with the clutter removed, a portrait with the background reconstructed, a thumbnail frame with a distraction erased. That fixed image is an asset, not a post. Kompozy is where that asset becomes published content across platforms. Bring a Moebius-edited image into Kompozy and it becomes the base for a Photo Post, a quote graphic, a carousel slide, or a Persona image, with branded captions and on-style overlays added through HyperFrames and the frame reformatted for each destination's aspect ratio. From there Kompozy schedules and publishes the same piece to TikTok, Reels, YouTube Shorts, X, LinkedIn, and the rest of the nine supported platforms from one queue, instead of you exporting and re-uploading by hand.
The repurposing math is what makes the pairing work. One clean image can seed a whole content unit in Kompozy: the photo itself for image feeds, a quote card built on top of it, a text post and a thread written in your own voice through your Persona Brief, and a caption set sized per platform — so a single edit fans out into a week of cross-platform posts. Moebius owns the pixel-level fix; Kompozy owns the captions, the format fan-out, the schedule, and the publish.
Moebius is a lightweight image inpainting model — it fills in or replaces masked regions of an image to match the surrounding scene. Built by researchers at Huazhong University of Science and Technology and vivo AI Lab, it runs on about 0.22B parameters yet claims inpainting quality on par with 10B-scale models like FLUX.1-Fill-Dev, with more than 15× faster inference.
The authors redesign the diffusion backbone around a Local-λ Mix Interaction (LλMI) block that compresses spatial context and global semantics into fixed-size linear matrices instead of the heavy quadratic attention larger models use, and they distill knowledge from a bigger teacher model. On their reported benchmarks this keeps inpainting quality competitive while cutting parameters to under 2% of a 10B-scale model.
Yes — the code and model weights are released under the Apache 2.0 license, with checkpoints available for benchmarks like Places2, CelebA-HQ, and FFHQ. You run it yourself rather than through a hosted product, so the real cost is the GPU you run inference on. Check the official repo for current checkpoints and setup.
No. Moebius is an inpainting specialist — it edits existing images by reconstructing masked regions (object removal, background cleanup, repair). It is not a general text-to-image or text-to-video generator that creates a scene from a blank prompt.
Moebius produces the cleaned-up still but does not publish it. Bring the export into Kompozy to build a Photo Post, quote graphic, carousel, or Persona image on top of it with branded captions, reframe it per platform, and schedule and publish across TikTok, Reels, YouTube Shorts, X, LinkedIn, and more from one queue.