// models

Open models.
Frontier results.

The open-model frontier already covers the work enterprises actually run, retrieval, code, agents, extraction. More than 80% of tasks can be done with open models, same result at a much lower cost.

see_benchmarks

// the 80 / 20

Open models cover 80% of enterprise inference.

The work that runs your business (RAG, classification, text-to-speech, internal assistants) is solved today by open-weight models. Leave just that 20% for the frontier models.

The 80% — open, private, flat-rate

RAG over internal knowledge
Classification & routing
Code generation & review
Internal assistants & copilots
Document extraction
Summarization
Translation
Autonomous agents

The 20% — frontier closed labs

Bleeding-edge reasoning at the very limit of capability. Real, but specific — and rarely the workload a regulated team needs to keep in-house. We are honest about that boundary instead of pretending it doesn't exist.

// benchmark · artificial analysis

Frontier-class intelligence, open-weight prices.

The Artificial Analysis Intelligence Index is a composite of nine hard evaluations — GPQA Diamond, SciCode, Terminal-Bench, Humanity's Last Exam and more. Open-weight models now sit just under the closed frontier — and the one Helmcode runs costs cents.

Claude Fable 5 Anthropic 60 —

Claude Opus 4.8 Anthropic 56 $5.00 / $25.00

GPT-5.5 OpenAI 55 $5.00 / $30.00

GLM-5.2 Z.AI open 51 $1.40 / $4.40

DeepSeek V4 Pro DeepSeek open 44 $0.43 / $0.87

MiMo V2.5 Xiaomi on helmcode 42 $0.44 / $0.87

DeepSeek V4 Flash DeepSeek on helmcode 40 $0.14 / $0.28

Qwen3.6 35B Alibaba on helmcode 32 $0.25 / $1.49

Gemma 4 26B Google on helmcode 26 $0.13 / $0.40

GLM-5.2 leads all 92 open-weight models at 51 — within reach of Claude Opus 4.8 (56) and GPT-5.5 (55). The models Helmcode runs sit just behind: MiMo V2.5 (42), DeepSeek V4 Flash (40), Qwen3.6 35B (32) and Gemma 4 26B (26). DeepSeek V4 Flash scores 40 at $0.14 / $0.28 per million tokens — against Opus 4.8's $5.00 / $25.00, that's ~35× cheaper to read and ~90× cheaper to write, for 67% of the index leader's intelligence.

Source: artificialanalysis.ai · Intelligence Index v4.1 · June 2026 · composite of 9 evaluations · price = first-party API, per 1M tokens (input · output). Rows marked on helmcode — MiMo V2.5, DeepSeek V4 Flash, Qwen3.6 35B and Gemma 4 26B — run on Helmcode. GLM-5.2 and DeepSeek V4 Pro are open-weight but not on the platform.

// proven in production

Benchmarks are one thing. Traffic is another.

The argument isn't theoretical. On our own platform, near-enough all inference already runs on open models — and most of it on a single 35B one.

333.8B

Tokens in production

cumulative

76%

Run on Qwen 3.6 (35B)

the open workhorse

99.5%

Of tokens on open models

LLM traffic

See the live numbers on OpenData

// the lineup

What actually runs the 80%.

The four language models, ordered by real production token share. One open 35B model carries most of the load — the rest step in for reasoning, scale and multimodal.

qwen3.6 35B MoE · 256K ctx High-volume RAG, classification, code

deepseek-v4-flash 284B MoE · 1M ctx Reasoning, agents, long-context

mimo-v2.5 310B MoE · 1M ctx Multimodal — vision + audio + text

gemma4 26B MoE · 256K ctx Efficient assistants, document work

Plus embeddings & reranking (qwen3-embedding, rerank) and speech (kokoro, whisper-large-v3) — nine models on one API. Full model reference →

// models faq

Open models, answered.

The questions every CTO asks before trusting open weights in production.

Are open models actually good enough?

For the work enterprises run day to day — yes. On Artificial Analysis’ Intelligence Index, GLM-5.2 ranks #1 of 92 open-weight models, just under the closed frontier — and the model we run, DeepSeek V4 Flash, delivers around two-thirds of leader-level intelligence for cents per million tokens. In production, 99.5% of all tokens on Helmcode already flow through open models. The gap that remains is a narrow set of frontier tasks most teams never hit.

Which model should I use?

Start with Qwen 3.6 — it carries three quarters of all production traffic and is the fastest, cheapest path for RAG, classification and code. Move to DeepSeek V4-Flash for hard reasoning, agents or 1M-token context, and MiMo for multimodal input. Same API, just change the model id.

What about the 20% that genuinely needs GPT-5?

It exists, and it is more specific than most assume — frontier-only reasoning at the very edge of capability. Helmcode is honest about that boundary: we cover the 80% that runs your business, privately and at a flat rate, not the last mile of the leaderboard.

How current are these benchmarks?

Figures are published scores as of June 2026 — open models served on Helmcode, closed-model numbers from vendor reports. Benchmarks move every release, so treat them as directional. What does not move is where your data is processed: always the EU, always zero logs.

Can I run a model that is not listed?

On Dedicated and On-premise plans, yes — custom or fine-tuned open-weight models on hardware reserved for you. The Shared cluster serves the curated lineup above.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.

book_a_call

Open models.Frontier results.