// opendata

The real numbers
behind the platform.

Real usage data from our inference platform — aggregated and anonymized. No prompts, no content; just request-level counters.

Updated weekly · 28 Jun 2026

374.4B

Tokens processed

cumulative

33.6M

Requests served

cumulative

9

Active models

in production

// models

Tokens by model.

Cumulative tokens processed per model. One open model carries most of the load — and the full stack is always available.

01 Qwen 3.6 73.1% 273.8 B · 29.9 M
02 DeepSeek V4-Flash 13.1% 48.9 B · 995.7 K
03 MiMo V2.5 10.7% 39.9 B · 535.6 K
04 Gemma 4 2.5% 9.5 B · 1.6 M
05 Qwen3 Embedding 0.6% 2.2 B · 563.1 K
06 Qwen3 Coder <0.1% 142.8 M · 7.1 K
07 Whisper Large v3 <0.1% 24 · 20.8 K

// tokens

Input vs. output.

Inference here is overwhelmingly read-heavy — long prompts, retrieval and context — with a thin slice of generated tokens.

Input · prompt 366.1 B 97.8%
Output · generated 8.3 B 2.2%

// usage

Tokens per day.

Daily tokens processed over the last 90 days, peaking at 8.8 B/day.

0 3 B 6 B 9 B 8.8 B peak
90 days agotoday

// beyond text

Speech & reranking.

The stack is more than LLMs — transcription, synthesis and reranking run on the same API.

Text-to-speech Kokoro 20.7 K requests
Reranking Qwen3 Reranker 5.2 K requests

// clients · last 7 days

How teams connect.

Drop-in OpenAI compatibility in the wild — the official SDK and OpenCode account for the vast majority of traffic.

OpenAI SDK (Python) 49.7% 245 devs
OpenAI SDK (JS) 11.8% 58 devs
Node.js 8.1% 40 devs
curl 7.7% 38 devs
Vercel AI SDK 6.1% 30 devs
Python httpx 4.9% 24 devs
Go HTTP client 4.3% 21 devs
Python requests 3.7% 18 devs
Others 3.9% 19 devs

// geography · last 7 days

Where requests come from.

76.1% of traffic originates inside the EU — the audience this infrastructure is built for.

Spain 31.6% 837.2 K
Finland 27.1% 718.9 K
Germany 16.1% 427.4 K
Colombia 12.7% 337.8 K
United States 4.8% 127.8 K
United Kingdom 2.1% 56.1 K
Mexico 2.1% 55.7 K
Argentina 1.1% 29.1 K
France 0.7% 18.9 K
Netherlands 0.4% 11.8 K
Ireland 0.2% 4.4 K
Chile 0.2% 4.4 K
Others 0.9% 23.8 K

// performance

Latency & throughput.

Median time to first token and sustained throughput per model, measured on 28 Jun 2026.

Model TTFT p50 Throughput
Qwen 3.6 1 s 4996733 rpm
Gemma 4 98 ms 71.5 rpm
Qwen3 Embedding n/a 33.6 rpm
MiMo V2.5 1.3 s 28.7 rpm
DeepSeek V4-Flash 6.8 s 12.4 rpm
Qwen3 Reranker n/a 7.8 rpm
Kokoro n/a 1 rpm
Whisper Large v3 n/a 0.1 rpm

TTFT p50 = median time to first token · Throughput = sustained requests per minute.

// who it's for

Two ways to run on this stack.

These numbers come from real workloads across the community and private deployments alike.

Builders & community

Frontier models, fair price, no data sharing.

Access the latest open models at a reasonable cost, without handing over your data — through the NaN community.

nan.builders →
Startups & enterprise

Private, dedicated inference with SLAs.

Dedicated infrastructure, support and contractual SLAs — flat rate, EU data, OpenAI-compatible.

see_pricing →

Methodology. Figures are aggregated, anonymized counters collected at the request level. Helmcode keeps zero logs — no prompt or completion content is ever stored. Cumulative metrics span the platform's lifetime; windowed metrics are labelled per section.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.