step 01
Ground in your help center
qwen3-embedding Retrieve from your knowledge base, docs and past tickets so answers are accurate and on-policy — grounded in your reality, not a generic model's.
// use cases · customer support
Deflect and answer tickets and conversations across chat, email and voice, grounded in your help center, with customer data that never leaves the EU.
// how it works
Retrieval, reasoning, tools and voice from a single OpenAI-compatible endpoint — across every channel, and only inside the EU.
step 01
qwen3-embedding Retrieve from your knowledge base, docs and past tickets so answers are accurate and on-policy — grounded in your reality, not a generic model's.
step 02
deepseek-v4-flash Resolve tickets and chats with tool calling — look up orders, check accounts, take actions — and escalate cleanly to a human when it's the right call.
step 03
kokoro Chat, email or voice — transcribe and speak with whisper-large-v3 and kokoro for voicebots and IVR. One stack, every customer touchpoint.
// drop-in
Grounded answers with tools, one chat completion. Change the base URL and key and your support bot runs on private EU models.
read_the_docsfrom openai import OpenAI client = OpenAI( api_key="sk-...", base_url="https://api.helmcode.com/v1", # one line changes ) # answer from your help center, with tools to take action reply = client.chat.completions.create( model="deepseek-v4-flash", messages=[ {"role": "system", "content": "Answer from the help center." + kb_context}, *conversation, ], tools=tools, # look up orders, check accounts )
// why helmcode
Every ticket and call is full of customer PII. Closed APIs want all of it — the messages, and the recordings.
Tickets, chats and recordings are never logged, and never train a model. The personal data in a conversation stays exactly where it should.
Conversations and customer data stay on EU infrastructure — not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.
Every ticket, message and minute is included. Limits are RPM and concurrency per key — never total tokens — so cost per interaction drops as volume grows.
Text and voice — STT, LLM and TTS — behind a single OpenAI-compatible endpoint. One stack for tickets, chat and voicebots.
Answers retrieved from your own knowledge base, with citations — fewer hallucinations, fewer wrong answers, fewer escalations.
OpenAI-compatible chat, tools and audio. Change the base URL and key; your helpdesk, CCaaS or bot framework keeps working.
// support faq
What CX and engineering teams ask before automating support on their own data.
Yes. Ground answers with retrieval (qwen3-embedding + rerank) over your knowledge base, docs and past tickets, so responses are accurate and on-policy — with citations.
Yes. With tool calling it can look up orders, check accounts and trigger workflows, then escalate to a human when needed — the same JSON tools you already use with OpenAI.
Yes. Transcribe calls with whisper-large-v3 and synthesize replies with kokoro (sub-second, 67 voices) for voicebots and IVR — all from the same API.
No. Zero logs — conversations, transcripts and recordings are never persisted and never train a model.
Flat-rate pricing with no token caps means every deflected ticket and message is included — so as volume grows, your cost per interaction falls instead of rising.
Yes. Run on a dedicated GPU or fully on-premise inside your own datacenter — the same API and code, with customer data that never leaves your network.
// get started
Skip the AI infra work. Deploy your first private inference endpoint today.
Flat rate. EU data. OpenAI API compatible.
// cookies
We use strictly necessary cookies to run the site and, only with your consent, Google Analytics to understand usage. No advertising, ever — see our Cookie Policy.
// preferences