// changelog

Every shipped
change.

A running log of what we ship — new models, API surface, performance and platform. No marketing, just the diffs.

Jun 20, 2026 platform

NewModel

MiMo V2.5 is now available

Full multimodal input — image, audio and text in, text out — in a single model, behind the same OpenAI-compatible API.

Call it with model id mimo-v2.5
310B MoE · 1M context · vision + audio

Jun 12, 2026 platform

Improved

2× throughput on Qwen 3.6

Speculative decoding is now on by default for qwen3.6 — roughly double the tokens per second at the same latency, no change on your side.

Jun 5, 2026 api

NewAPI

Reranking endpoint

A dedicated /v1/rerank endpoint for cross-lingual semantic reranking — the missing middle step of RAG (embedding → rerank → LLM).

Powered by Qwen3-Reranker-8B
100+ languages

May 28, 2026 console

Security

Zero-logs attestation in the console

Every API key now shows a live attestation that no prompt or completion content is stored — something your compliance team can screenshot.

May 19, 2026 platform

New

Dedicated GPU plans

Exclusive NVIDIA B200 hardware inside Helmcode's EU infrastructure — guaranteed throughput, full network isolation and custom models.

Custom models & fine-tuning
Custom SLA

May 8, 2026 platform

Improved

Faster cold starts, lower p95

Reworked model loading and routing in the control plane. Cold starts are noticeably quicker and p95 latency is down across the board.

Apr 30, 2026 api

NewAPI

Speech: TTS and STT

Kokoro text-to-speech (sub-second latency, 67 voices) and Whisper Large v3 speech-to-text (99+ languages) — both behind the same key.

/v1/audio/speech and /v1/audio/transcriptions

Apr 22, 2026 api

Fixed

Streaming with tool calls

Fixed an edge case where streamed responses could truncate when a tool call and content were interleaved. Streaming is solid across all chat models.

That's everything so far — updated as we ship.

Every shippedchange.