A 3-billion-parameter open reasoning model that matches far larger models on math and code.
Last verified · 2026-06-24 · by Moe Ameen
VibeThinker-3B is an open-weight reasoning model released by WeiboAI, the AI team at Sina Weibo, in June 2026 (its technical report, "Exploring the Frontier of Verifiable Reasoning in Small Language Models," landed on arXiv on June 15, 2026). It has 3 billion parameters — small enough to run on a single consumer GPU — and is built on a Qwen2.5 3B base (the Qwen2.5-Coder-3B model). It ships under the permissive MIT license, so you can download the weights from Hugging Face and use them commercially.
The headline is the size-to-skill ratio. WeiboAI reports VibeThinker-3B scoring 94.3 on AIME26 and 89.3 on HMMT25 (two hard competition-math benchmarks), 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on recent unseen LeetCode contests (123 of 128 first-attempt submissions). On those verifiable-reasoning benchmarks the paper positions it as competitive with frontier models hundreds of times its size — it cites comparisons against the likes of DeepSeek V3.2 and Gemini 3 Pro. Benchmark parity is not the same as being as broadly capable as those models, so read the claim as "matches them on these specific math and code tests," not "replaces them."
What makes that possible is the training recipe, not raw scale. VibeThinker follows what WeiboAI calls the Spectrum-to-Signal Principle: a curriculum-based, two-stage supervised fine-tuning pass first teaches the model a broad spectrum of valid reasoning paths across math, code, and STEM, then a reinforcement-learning stage (using a GRPO variant the team calls MaxEnt-Guided Policy Optimization, or MGPO) amplifies the correct reasoning signal using verifiable rewards, followed by offline self-distillation. It continues the same approach the team used for the earlier, even smaller VibeThinker-1.5B.
The honest scope note matters for creators. VibeThinker-3B is a verifiable-reasoning model aimed at math, code, and STEM problems with checkable answers. WeiboAI explicitly states it was not trained on tool-calling or agent-based programming data, and it is not pitched as a general chat model, a copywriter, or a media generator. It outputs text reasoning — no images, video, or audio — and it is strongest exactly where the answer can be graded as right or wrong.
Treat VibeThinker-3B as the calculator in your content operation, not the writer. It is a verifiable-reasoning model — brilliant at math and code, deliberately not trained to write brand copy or generate media — so the smart pairing uses it for the logic-heavy, checkable parts of the workflow and lets Kompozy own everything that becomes an actual post. Run VibeThinker locally on the numbers: feed it your analytics export and have it reason through which formats and posting times actually moved the needle, work out the cadence math for a launch, or pressure-test the logic of a content plan before you commit a month to it. Because it is a 3B model under MIT license, that reasoning can run on your own GPU with no API bill and no data leaving your machine.
Then the conclusion goes into Kompozy, which is the part VibeThinker can't do. Kompozy generates the content the reasoning model never touches — persona and avatar video with native captions, Photo Posts and face-locked Persona Photos, multi-slide Carousels and Quote Graphics rendered pixel-exact through HyperFrames, plus blogs, newsletters, and platform-native text written in your voice through the Persona Brief (on managed Claude and OpenAI models, which are the right tools for prose). Kompozy then schedules and publishes the set across the nine supported platforms plus email and blog, on autopilot, with a per-post review pipeline. VibeThinker decides what to make and why; Kompozy makes it and ships it. Don't ask a math reasoner to write your captions — that's the one job in this pairing it isn't built for.
VibeThinker-3B is an open-weight, 3-billion-parameter reasoning model released by WeiboAI (Sina Weibo's AI team) in June 2026, built on a Qwen2.5 3B base and licensed under MIT. It is tuned for verifiable reasoning — math, code, and STEM — and its technical report reports benchmark scores competitive with much larger frontier models on those specific tasks.
It does not beat them across the board — the claim is benchmark parity on specific verifiable-reasoning tests like AIME26 and LiveCodeBench. WeiboAI attributes that to its training recipe, the Spectrum-to-Signal Principle: curriculum-based supervised fine-tuning that teaches a wide range of reasoning paths, then reinforcement learning (a GRPO variant called MGPO) that amplifies the correct reasoning signal using verifiable rewards.
No. It is a text reasoning model focused on math, code, and STEM with checkable answers, and WeiboAI notes it was not trained for tool-calling, agentic work, or general copywriting. It produces no images, video, or audio. To turn its analysis into finished posts you pair it with a content engine like Kompozy.
Yes. The weights are published on Hugging Face under the MIT license, which permits commercial use. Because it is only 3B parameters, it can run on a single consumer GPU, so your real cost is the hardware to run inference rather than a per-token API fee.
As the analytical layer, not the writer. Run it locally to reason over performance data, compute posting cadence, or check the logic of a content plan, then take that conclusion into Kompozy to generate the video, images, and copy in your brand voice and publish across platforms.