VOX is the voice intelligence layer between your customers and your tools. It listens, understands, and routes — to n8n, MCP servers, your CRM, or any API. Self-hosted at ~$0.015/min, 5× cheaper than Retell or Vapi.
How it works
Tell the AI assistant what your agent should do. It generates the full tool configuration — parameters, integrations, knowledge base — no forms, no code.
Pick from 1,200+ MCP integrations or 400+ n8n workflows. VOX acts as the intelligence layer — it understands context, extracts parameters, and routes to the right backend.
Drop a single embed into your website. Customers click, speak, and get answers — in under 800ms, at a fraction of SaaS cost.
1,200+
MCP integrations
The open standard for AI tool calls. Sub-200ms. Self-hostable. VOX can call any MCP server natively — CRM, ticketing, calendar, database, payment — inside a live voice conversation.
400+
n8n workflow templates
Complex multi-step workflows with visual editing and 400+ pre-built integrations. VOX calls your n8n webhooks with structured payloads extracted from the conversation — you build the logic, VOX triggers it.
Plus direct API calls, PostgreSQL, MongoDB, Qdrant vector search, Typesense geo+structured search — VOX connects to anything.
Platform
Every layer engineered to minimise latency, maximise accuracy, and eliminate SaaS dependency.
Parakeet TDT STT + Llama 3.3 70B + CosyVoice2 TTS — fully self-hosted, fully streaming. VAD-based barge-in under 200ms. No SaaS latency taxes.
VOX is the intelligence layer — not a workflow engine. It understands what the caller wants, extracts parameters from conversation, and dispatches to n8n, MCP servers, or direct APIs.
RAG for unstructured docs (PDFs, FAQs). Typesense for structured + geo search ("Indian restaurant near me, medium spice"). Live tool calls for real-time data.
Tenants configure tools through an AI chat assistant — not forms or code. Split parameters: agent extracts some from conversation, tenants pre-fill others. Sensitive values encrypted at rest.
Isolated agents, API keys, knowledge bases, analytics and billing per tenant. Serve hundreds of customers on one self-hosted deployment.
~$0.015/min vs $0.07–0.10/min for Retell or Vapi. No per-minute SaaS fees. Deploy on OVHCloud, AWS, or any GPU node you own.
Voice pipeline — fully open-source, fully self-hosted
Total target latency: <800ms · GPU-accelerated on OVHCloud or your own node
FAQ
Self-hosted voice AI that routes to any tool, workflow, or knowledge base — at $0.015/min. No SaaS lock-in. No per-execution fees.