MiMo V2.5 is now available
Full multimodal input — image, audio and text in, text out — in a single model, behind the same OpenAI-compatible API.
- Call it with model id mimo-v2.5
- 310B MoE · 1M context · vision + audio
// changelog
A running log of what we ship — new models, API surface, performance and platform. No marketing, just the diffs.
Full multimodal input — image, audio and text in, text out — in a single model, behind the same OpenAI-compatible API.
Speculative decoding is now on by default for qwen3.6 — roughly double the tokens per second at the same latency, no change on your side.
A dedicated /v1/rerank endpoint for cross-lingual semantic reranking — the missing middle step of RAG (embedding → rerank → LLM).
Every API key now shows a live attestation that no prompt or completion content is stored — something your compliance team can screenshot.
Exclusive NVIDIA B200 hardware inside Helmcode's EU infrastructure — guaranteed throughput, full network isolation and custom models.
Reworked model loading and routing in the control plane. Cold starts are noticeably quicker and p95 latency is down across the board.
Kokoro text-to-speech (sub-second latency, 67 voices) and Whisper Large v3 speech-to-text (99+ languages) — both behind the same key.
Fixed an edge case where streamed responses could truncate when a tool call and content were interleaved. Streaming is solid across all chat models.
That's everything so far — updated as we ship.
// cookies
We use strictly necessary cookies to run the site and, only with your consent, Google Analytics to understand usage. No advertising, ever — see our Cookie Policy.
// preferences