MiniMax M3 will add vision capabilities, lead engineer hints
MiniMax M3 will be multimodal. Lead engineer Skyler Miao said on X that vision capababilites will be included in the next iteration of the popular model.
MiniMax engineer Skyler Miao confirmed that the company’s upcoming M3 model will include multimodal vision capabilities. The update addresses the primary limitation of the recently launched M2.7, a text-only model that is currently rivaling Anthropic’s Claude Opus in agentic workflows and coding performance.
The vision confirmation comes as the open-source community prepares for the release of M2.7’s open weights, expected within the next two weeks. That release will allow developers to fine-tune the model locally, setting the stage for the massive architectural jump expected with the M3 launch.
- The M2.7 baseline: The current text-only model already scores a highly competitive 56.22 percent on the SWE-Pro coding benchmark.
- The M3 scale: The upcoming multimodal model is expected to feature a 1-trillion parameter architecture alongside a 1-million token context window.
- The timeline: No release date for M3 has been announced. Miao’s confirmation was a reply, not an official roadmap post.
While the potential massive scale of M3 signals MiniMax’s ambition to compete directly with frontier models, the announcement sparked immediate pushback regarding accessibility. Thread replies to the confirmation revealed heavy user demand for a compressed version of the M3 model built specifically for consumer hardware.
- The hardware limit: Independent researchers are requesting a smaller, 35-billion parameter variant of M3 that can comfortably fit inside 32GB of consumer VRAM.
- The community appeal: MiniMax has built strong momentum with independent developers due to its rapid model iterations and commitment to open weights.
The Bottom Line: The AI industry is currently obsessed with building massive, trillion-parameter systems to dominate benchmarks. However, the immediate reaction to the M3 announcement proves that the independent developers actually driving open-source adoption just want highly capable models small enough to run on the hardware they already own.
If you need on-demand GPUs for training, fine-tuning, inference, or running open-source models, give RunPod a try.
- Available hardware: H100, H200, A100, L40S, RTX 4090, RTX 5090, and 30+ more
- Cost: significantly cheaper than AWS or GCP, billed per second, no contracts
- Setup: spins up in under a minute, 30+ regions worldwide

Get the core business tech news delivered straight to your inbox. We track AI, automation, SaaS, and cybersecurity so you don't have to.
Just read what you want, and be done with it.





