// opendata

The real numbers
behind the platform.

Real usage data from our inference platform — aggregated and anonymized. No prompts, no content; just request-level counters.

Updated weekly · 28 Jun 2026

374.4B

Tokens processed

cumulative

33.6M

Requests served

cumulative

Active models

in production

// models

Tokens by model.

Cumulative tokens processed per model. One open model carries most of the load — and the full stack is always available.

01 Qwen 3.6 73.1% 273.8 B · 29.9 M

02 DeepSeek V4-Flash 13.1% 48.9 B · 995.7 K

03 MiMo V2.5 10.7% 39.9 B · 535.6 K

04 Gemma 4 2.5% 9.5 B · 1.6 M

05 Qwen3 Embedding 0.6% 2.2 B · 563.1 K

06 Qwen3 Coder <0.1% 142.8 M · 7.1 K

07 Whisper Large v3 <0.1% 24 · 20.8 K

// tokens

Input vs. output.

Inference here is overwhelmingly read-heavy — long prompts, retrieval and context — with a thin slice of generated tokens.

Input · prompt 366.1 B 97.8%

Output · generated 8.3 B 2.2%

// usage

Tokens per day.

Daily tokens processed over the last 90 days, peaking at 8.8 B/day.

90 days agotoday

// beyond text

Speech & reranking.

The stack is more than LLMs — transcription, synthesis and reranking run on the same API.

Text-to-speech Kokoro 20.7 K requests

Reranking Qwen3 Reranker 5.2 K requests

// clients · last 7 days

How teams connect.

Drop-in OpenAI compatibility in the wild — the official SDK and OpenCode account for the vast majority of traffic.

OpenAI SDK (Python) 49.7% 245 devs

OpenAI SDK (JS) 11.8% 58 devs

Node.js 8.1% 40 devs

curl 7.7% 38 devs

Vercel AI SDK 6.1% 30 devs

Python httpx 4.9% 24 devs

Go HTTP client 4.3% 21 devs

Python requests 3.7% 18 devs

Others 3.9% 19 devs

// geography · last 7 days

Where requests come from.

76.1% of traffic originates inside the EU — the audience this infrastructure is built for.

Spain 31.6% 837.2 K

Finland 27.1% 718.9 K

Germany 16.1% 427.4 K

Colombia 12.7% 337.8 K

United States 4.8% 127.8 K

United Kingdom 2.1% 56.1 K

Mexico 2.1% 55.7 K

Argentina 1.1% 29.1 K

France 0.7% 18.9 K

Netherlands 0.4% 11.8 K

Ireland 0.2% 4.4 K

Chile 0.2% 4.4 K

Others 0.9% 23.8 K

// performance

Latency & throughput.

Median time to first token and sustained throughput per model, measured on 28 Jun 2026.

Model TTFT p50 Throughput

Qwen 3.6 1 s 4996733 rpm

Gemma 4 98 ms 71.5 rpm

Qwen3 Embedding n/a 33.6 rpm

MiMo V2.5 1.3 s 28.7 rpm

DeepSeek V4-Flash 6.8 s 12.4 rpm

Qwen3 Reranker n/a 7.8 rpm

Kokoro n/a 1 rpm

Whisper Large v3 n/a 0.1 rpm

TTFT p50 = median time to first token · Throughput = sustained requests per minute.

// who it's for

Two ways to run on this stack.

These numbers come from real workloads across the community and private deployments alike.

Builders & community

Frontier models, fair price, no data sharing.

Access the latest open models at a reasonable cost, without handing over your data — through the NaN community.

nan.builders →

Startups & enterprise

Private, dedicated inference with SLAs.

Dedicated infrastructure, support and contractual SLAs — flat rate, EU data, OpenAI-compatible.

see_pricing →

Methodology. Figures are aggregated, anonymized counters collected at the request level. Helmcode keeps zero logs — no prompt or completion content is ever stored. Cumulative metrics span the platform's lifetime; windowed metrics are labelled per section.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.

book_a_call

The real numbersbehind the platform.