// models

Open models.
Frontier results.

The open-model frontier already covers the work enterprises actually run, retrieval, code, agents, extraction. More than 80% of tasks can be done with open models, same result at a much lower cost.

// the 80 / 20

Open models cover 80% of enterprise inference.

The work that runs your business (RAG, classification, text-to-speech, internal assistants) is solved today by open-weight models. Leave just that 20% for the frontier models.

The 80% — open, private, flat-rate
  • RAG over internal knowledge
  • Classification & routing
  • Code generation & review
  • Internal assistants & copilots
  • Document extraction
  • Summarization
  • Translation
  • Autonomous agents
The 20% — frontier closed labs

Bleeding-edge reasoning at the very limit of capability. Real, but specific — and rarely the workload a regulated team needs to keep in-house. We are honest about that boundary instead of pretending it doesn't exist.

// benchmark · artificial analysis

Frontier-class intelligence, open-weight prices.

The Artificial Analysis Intelligence Index is a composite of nine hard evaluations — GPQA Diamond, SciCode, Terminal-Bench, Humanity's Last Exam and more. Open-weight models now sit just under the closed frontier — and the one Helmcode runs costs cents.

Claude Fable 5 Anthropic 60
Claude Opus 4.8 Anthropic 56 $5.00 / $25.00
GPT-5.5 OpenAI 55 $5.00 / $30.00
GLM-5.2 Z.AI open 51 $1.40 / $4.40
DeepSeek V4 Pro DeepSeek open 44 $0.43 / $0.87
MiMo V2.5 Xiaomi on helmcode 42 $0.44 / $0.87
DeepSeek V4 Flash DeepSeek on helmcode 40 $0.14 / $0.28
Qwen3.6 35B Alibaba on helmcode 32 $0.25 / $1.49
Gemma 4 26B Google on helmcode 26 $0.13 / $0.40

GLM-5.2 leads all 92 open-weight models at 51 — within reach of Claude Opus 4.8 (56) and GPT-5.5 (55). The models Helmcode runs sit just behind: MiMo V2.5 (42), DeepSeek V4 Flash (40), Qwen3.6 35B (32) and Gemma 4 26B (26). DeepSeek V4 Flash scores 40 at $0.14 / $0.28 per million tokens — against Opus 4.8's $5.00 / $25.00, that's ~35× cheaper to read and ~90× cheaper to write, for 67% of the index leader's intelligence.

Source: artificialanalysis.ai · Intelligence Index v4.1 · June 2026 · composite of 9 evaluations · price = first-party API, per 1M tokens (input · output). Rows marked on helmcode — MiMo V2.5, DeepSeek V4 Flash, Qwen3.6 35B and Gemma 4 26B — run on Helmcode. GLM-5.2 and DeepSeek V4 Pro are open-weight but not on the platform.

// proven in production

Benchmarks are one thing. Traffic is another.

The argument isn't theoretical. On our own platform, near-enough all inference already runs on open models — and most of it on a single 35B one.

333.8B

Tokens in production

cumulative

76%

Run on Qwen 3.6 (35B)

the open workhorse

99.5%

Of tokens on open models

LLM traffic

See the live numbers on OpenData

// the lineup

What actually runs the 80%.

The four language models, ordered by real production token share. One open 35B model carries most of the load — the rest step in for reasoning, scale and multimodal.

qwen3.6 35B MoE · 256K ctx 76.1% High-volume RAG, classification, code
deepseek-v4-flash 284B MoE · 1M ctx 12.4% Reasoning, agents, long-context
mimo-v2.5 310B MoE · 1M ctx 8.6% Multimodal — vision + audio + text
gemma4 26B MoE · 256K ctx 2.4% Efficient assistants, document work

Plus embeddings & reranking (qwen3-embedding, rerank) and speech (kokoro, whisper-large-v3) — nine models on one API. Full model reference →

// models faq

Open models, answered.

The questions every CTO asks before trusting open weights in production.

Are open models actually good enough?

For the work enterprises run day to day — yes. On Artificial Analysis’ Intelligence Index, GLM-5.2 ranks #1 of 92 open-weight models, just under the closed frontier — and the model we run, DeepSeek V4 Flash, delivers around two-thirds of leader-level intelligence for cents per million tokens. In production, 99.5% of all tokens on Helmcode already flow through open models. The gap that remains is a narrow set of frontier tasks most teams never hit.

Which model should I use?

Start with Qwen 3.6 — it carries three quarters of all production traffic and is the fastest, cheapest path for RAG, classification and code. Move to DeepSeek V4-Flash for hard reasoning, agents or 1M-token context, and MiMo for multimodal input. Same API, just change the model id.

What about the 20% that genuinely needs GPT-5?

It exists, and it is more specific than most assume — frontier-only reasoning at the very edge of capability. Helmcode is honest about that boundary: we cover the 80% that runs your business, privately and at a flat rate, not the last mile of the leaderboard.

How current are these benchmarks?

Figures are published scores as of June 2026 — open models served on Helmcode, closed-model numbers from vendor reports. Benchmarks move every release, so treat them as directional. What does not move is where your data is processed: always the EU, always zero logs.

Can I run a model that is not listed?

On Dedicated and On-premise plans, yes — custom or fine-tuned open-weight models on hardware reserved for you. The Shared cluster serves the curated lineup above.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.