Self-hosted · Multi-Tenant · Production-Ready

Deploy AI voice agents that take real action.

VOX is the voice intelligence layer between your customers and your tools. It listens, understands, and routes — to n8n, MCP servers, your CRM, or any API. Self-hosted at ~$0.015/min, 5× cheaper than Retell or Vapi.

1,200+
MCP integrations
400+
n8n workflows
<800ms
End-to-end latency
~$0.015
Cost per minute
Cheaper than SaaS

How it works

From idea to live voice agent in minutes.

01

Configure with natural language

Tell the AI assistant what your agent should do. It generates the full tool configuration — parameters, integrations, knowledge base — no forms, no code.

02

Connect to any tool or workflow

Pick from 1,200+ MCP integrations or 400+ n8n workflows. VOX acts as the intelligence layer — it understands context, extracts parameters, and routes to the right backend.

03

Deploy with one line of code

Drop a single embed into your website. Customers click, speak, and get answers — in under 800ms, at a fraction of SaaS cost.

1,200+

MCP integrations

Model Context Protocol

The open standard for AI tool calls. Sub-200ms. Self-hostable. VOX can call any MCP server natively — CRM, ticketing, calendar, database, payment — inside a live voice conversation.

400+

n8n workflow templates

n8n Workflow Engine

Complex multi-step workflows with visual editing and 400+ pre-built integrations. VOX calls your n8n webhooks with structured payloads extracted from the conversation — you build the logic, VOX triggers it.

Plus direct API calls, PostgreSQL, MongoDB, Qdrant vector search, Typesense geo+structured search — VOX connects to anything.

Platform

Built for production at scale.

Every layer engineered to minimise latency, maximise accuracy, and eliminate SaaS dependency.

Voice Pipeline

Sub-800ms. Indistinguishable from human.

Parakeet TDT STT + Llama 3.3 70B + CosyVoice2 TTS — fully self-hosted, fully streaming. VAD-based barge-in under 200ms. No SaaS latency taxes.

Tool Intelligence

VOX routes. Your tools execute.

VOX is the intelligence layer — not a workflow engine. It understands what the caller wants, extracts parameters from conversation, and dispatches to n8n, MCP servers, or direct APIs.

Knowledge Base

Three modes of knowing.

RAG for unstructured docs (PDFs, FAQs). Typesense for structured + geo search ("Indian restaurant near me, medium spice"). Live tool calls for real-time data.

Configuration

Natural language tool setup.

Tenants configure tools through an AI chat assistant — not forms or code. Split parameters: agent extracts some from conversation, tenants pre-fill others. Sensitive values encrypted at rest.

Multi-Tenant

One platform. Unlimited businesses.

Isolated agents, API keys, knowledge bases, analytics and billing per tenant. Serve hundreds of customers on one self-hosted deployment.

Self-Hosted

Your data. Your infra. Your margin.

~$0.015/min vs $0.07–0.10/min for Retell or Vapi. No per-minute SaaS fees. Deploy on OVHCloud, AWS, or any GPU node you own.

Voice pipeline — fully open-source, fully self-hosted

LiveKit WebRTC
Real-time audio
Silero VAD
Voice detection · 85ms
Parakeet TDT 0.6B
Speech-to-text · 150ms
Llama 3.3 70B
LLM · 300ms first token
CosyVoice2 0.5B
Text-to-speech · 150ms

Total target latency: <800ms · GPU-accelerated on OVHCloud or your own node

FAQ

What VOX does — and what it hands off.

Ready to deploy your first voice agent?

Self-hosted voice AI that routes to any tool, workflow, or knowledge base — at $0.015/min. No SaaS lock-in. No per-execution fees.