step 01
Ingest the source
whisper-large-v3 Transcripts, tickets, threads or long PDFs — even audio, transcribed first with whisper-large-v3 in 99+ languages. Whatever the source, it's text in.
// use cases · summarization
Summarize calls, tickets, cases and long documents, up to 1M tokens in a single pass.
// how it works
Transcription, long-context summarization and structured output from a single OpenAI-compatible endpoint — and only inside the EU.
step 01
whisper-large-v3 Transcripts, tickets, threads or long PDFs — even audio, transcribed first with whisper-large-v3 in 99+ languages. Whatever the source, it's text in.
step 02
deepseek-v4-flash A 1M-token context window means whole documents and entire call histories go in at once — no lossy chunking, no stitching summaries of summaries.
step 03
qwen3.6 Structured summaries — TL;DR, action items, decisions, sentiment — in the exact format your product or workflow needs, via structured outputs.
// drop-in
One chat completion, the whole transcript in context. Change the base URL and key and your summarization code runs on private EU models.
read_the_docsfrom openai import OpenAI client = OpenAI( api_key="sk-...", base_url="https://api.helmcode.com/v1", # one line changes ) # whole transcript in one pass — up to 1M tokens of context summary = client.chat.completions.create( model="deepseek-v4-flash", messages=[ {"role": "system", "content": "Summarize into a TL;DR and action items."}, {"role": "user", "content": transcript}, ], )
// why helmcode
The calls and cases you summarize are your most sensitive records — exactly what you shouldn't hand to someone else's model.
The calls, tickets and documents you summarize are never stored, and never train a model — not ours, not anyone's.
Every summary runs on EU infrastructure — not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.
Up to 1M tokens of context. Summarize a 300-page report or a full call history in a single request — no chunking, no lost detail.
Summarize every call and ticket, not just a sample. Limits are RPM and concurrency per key — never total tokens.
Transcribe with whisper-large-v3 and summarize with an LLM behind a single OpenAI-compatible endpoint — calls become summaries in one place.
// summarization faq
What product and operations teams ask before summarizing their own records.
Up to a 1M-token context window with deepseek-v4-flash — whole documents, long threads and entire call histories in a single pass, with no lossy chunking.
Yes. Transcribe first with whisper-large-v3 (99+ languages) and summarize the transcript with an LLM — both behind the same OpenAI-compatible API.
No. Zero logs — your inputs and the summaries produced are never persisted and never train a model.
Yes. Use structured outputs to get a fixed shape — TL;DR, action items, decisions, sentiment — ready to drop into your product or database.
Yes. There are no token caps — limits are RPM and concurrency per API key — so you can summarize every call and ticket on flat, predictable pricing.
Run on a dedicated GPU or fully on-premise inside your own datacenter — the same API and code, with data that never leaves your network.
// get started
Skip the AI infra work. Deploy your first private inference endpoint today.
Flat rate. EU data. OpenAI API compatible.
// cookies
We use strictly necessary cookies to run the site and, only with your consent, Google Analytics to understand usage. No advertising, ever — see our Cookie Policy.
// preferences