// use cases · summarization

Long inputs in.
Short answers out.

Summarize calls, tickets, cases and long documents, up to 1M tokens in a single pass.

// how it works

Whatever it is, in one pass.

Transcription, long-context summarization and structured output from a single OpenAI-compatible endpoint — and only inside the EU.

step 01

Ingest the source

whisper-large-v3

Transcripts, tickets, threads or long PDFs — even audio, transcribed first with whisper-large-v3 in 99+ languages. Whatever the source, it's text in.

step 02

Summarize in one pass

deepseek-v4-flash

A 1M-token context window means whole documents and entire call histories go in at once — no lossy chunking, no stitching summaries of summaries.

step 03

Shape the output

qwen3.6

Structured summaries — TL;DR, action items, decisions, sentiment — in the exact format your product or workflow needs, via structured outputs.

// drop-in

Change one line. Keep your stack.

One chat completion, the whole transcript in context. Change the base URL and key and your summarization code runs on private EU models.

read_the_docs
summarize.py
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.helmcode.com/v1",  # one line changes
)

# whole transcript in one pass — up to 1M tokens of context
summary = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "Summarize into a TL;DR and action items."},
        {"role": "user", "content": transcript},
    ],
)

// why helmcode

Summaries that stay private.

The calls and cases you summarize are your most sensitive records — exactly what you shouldn't hand to someone else's model.

01

Zero logs, by architecture.

The calls, tickets and documents you summarize are never stored, and never train a model — not ours, not anyone's.

02

Processed in the EU.

Every summary runs on EU infrastructure — not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.

03

Whole inputs, one pass.

Up to 1M tokens of context. Summarize a 300-page report or a full call history in a single request — no chunking, no lost detail.

04

No caps on volume.

Summarize every call and ticket, not just a sample. Limits are RPM and concurrency per key — never total tokens.

05

Text or audio, one API.

Transcribe with whisper-large-v3 and summarize with an LLM behind a single OpenAI-compatible endpoint — calls become summaries in one place.

In production across
  • B2B SaaS
  • Contact center / BPO
  • Insurance
  • Healthcare
  • Pharma & biotech
  • HR & recruiting
In production at

// summarization faq

Summarization, answered.

What product and operations teams ask before summarizing their own records.

How long an input can I summarize?

Up to a 1M-token context window with deepseek-v4-flash — whole documents, long threads and entire call histories in a single pass, with no lossy chunking.

Can I summarize audio and calls, not just text?

Yes. Transcribe first with whisper-large-v3 (99+ languages) and summarize the transcript with an LLM — both behind the same OpenAI-compatible API.

Do you store what I summarize?

No. Zero logs — your inputs and the summaries produced are never persisted and never train a model.

Can I control the format of the summary?

Yes. Use structured outputs to get a fixed shape — TL;DR, action items, decisions, sentiment — ready to drop into your product or database.

Can it handle high volume?

Yes. There are no token caps — limits are RPM and concurrency per API key — so you can summarize every call and ticket on flat, predictable pricing.

What about sensitive records?

Run on a dedicated GPU or fully on-premise inside your own datacenter — the same API and code, with data that never leaves your network.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.