What channels does VOX support?

Four channels today: Web widget (WebRTC, one-line embed), Phone/AI IVR (SIP via Twilio/Telnyx), WhatsApp Business (text + voice), and Telegram (text + voice notes). All four route to the same AI agent — same personality, same knowledge base, same tools.

Which languages are supported?

17 languages across 3 tiers. Tier A: English, Chinese, Indonesian. Tier B: Hindi, Bengali, Urdu, French, German, Spanish, Portuguese, Russian. Tier C: Telugu, Tamil, Malayalam, Arabic, Japanese, Korean. Each tier uses purpose-built STT and TTS models — not just generic Whisper for everything.

What does VOX actually do vs what does it hand off?

VOX owns voice intelligence: speech recognition, conversation understanding, parameter extraction, context tracking, and tool routing. Everything else is handed off — n8n or MCP for workflow execution, Qdrant for RAG, Typesense for structured search, your CRM/API for data. VOX is the brain; the tools are the hands.

What does self-hosted actually save?

Retell AI charges $0.07/min. Vapi charges up to $0.10/min. Our self-hosted stack runs at ~$0.015/min at scale — roughly 5× cheaper. On 10,000 minutes/month, that's $550 saved per month per tenant.

How does the IVR / phone channel work?

Tenants buy a virtual phone number (India: Exotel or Tata Tele, US: Twilio or Telnyx). Inbound calls route to LiveKit's SIP bridge, which converts the PSTN call to a WebRTC room. The same agent worker handles it — no code changes needed. Business hours, fallback routing, and DTMF menus are all configurable.

How does WhatsApp integration work?

Tenants connect their WhatsApp Business Account via the Meta Business API. When a customer sends a message or taps their business number, the AI agent responds — either as text or as a voice message. Powered by the sd-vox-whatsapp-ivr integration that's already in production.

How does multi-tenant work?

Every tenant gets isolated agents, API keys, knowledge bases, analytics, team management (with roles: admin/agent/viewer), and billing. Each tenant can have multiple team members with different permissions. Hundreds of businesses can run on a single VOX deployment.

What is BYOK and which providers do you support?

BYOK (Bring Your Own Key) lets you use your own API keys from OpenAI, Anthropic (Claude), Google (Gemini), Groq, Inworld AI (200+ models with smart routing), or OpenRouter (400+ models, OpenAI-compatible) instead of our platform-managed models. You can also mix providers — e.g. Groq for STT, Claude for reasoning, OpenAI for TTS. Your key, your bill, no VOX markup on API usage.

How good are Indian language voices?

We use purpose-built Indic models — IndicWhisper for STT, IndicTTS (IIT Madras) for Dravidian languages, and CosyVoice2 for Hindi/Bengali. These deliver native-quality accents and natural prosody, not robotic English-to-Hindi translations. Hindi, Telugu, Tamil, Bengali, Urdu, and Malayalam are all supported with dedicated model pipelines.

What is the AI Configurator?

A natural language chat panel built into the dashboard. Describe what you need — 'Create a sales agent with a FAQ knowledge base' — and the AI creates agents, knowledge bases, toolboxes, tags, channels, and groups for you. 18 configuration tools, zero forms. Everything executes server-side with your session credentials.

How do Content Tags work?

Tags are color-coded labels you assign to documents, workflows, and MCP tools. During each conversation turn, the agent classifies the user's query and only retrieves content matching relevant tags. This prevents cross-department information leaks — sales questions only get sales docs, billing questions only get billing tools.

What is the Smart LLM Router?

A multi-model pool that selects the best LLM per conversation turn. Simple queries go to fast, cheap models (Groq, GPT-4o-mini). Complex reasoning goes to premium models (Claude, GPT-4o). Health tracking and automatic fallback ensure zero downtime if a provider goes down. Typical cost reduction: 40-60%.

Self-hosted · BYOK · Multi-Tenant · Production-Ready

AI voice agents for
every customer channel.

Web widget, phone (IVR), or WhatsApp — one AI agent handles them all. Bring your own API keys from OpenAI, Anthropic, Google, Groq, Inworld AI, or OpenRouter — or use our platform-managed GPU models. From $0.015/min platform-managed GPU, or use BYOK with 7+ AI providers.

Get Started Free See all channels

Trusted by businesses in India and the US · No credit card required

Live Demos

See VOX In Action

Explore real capabilities with interactive demos. Pick one and start talking.

progress_activity

4: Channels: Web, Phone, WhatsApp, Telegram
17: Supported languages
<800ms: End-to-end latency
~$0.015: Cost per minute (self-hosted)
7+: AI Providers (BYOK)
15: Platform features

Channels

Meet customers wherever they are.

One AI agent. Four channels. The same intelligent conversation — whether they click a widget, call a number, or message on WhatsApp or Telegram.

Live

Web Widget

One line of code. Instant voice on your site.

Embed a voice button into any website or web app. Customers click, speak, and get answers — no app download, no phone number. WebRTC delivers crystal-clear audio in the browser.

<script src="https://widget.scandeer.ai/widget.js"
  data-api-key="YOUR_KEY"
  data-agent-id="YOUR_AGENT">
</script>

Live

Phone / AI IVR

Your AI agent answers every call.

Give tenants virtual phone numbers. Inbound calls route to your AI agent via SIP bridge — no traditional IVR menus, just natural conversation. Powered by Twilio and Telnyx.

TwilioTelnyx

Live

The channel your customers already use.

Connect your WhatsApp Business Account. When customers send a message or tap your number, your AI agent responds — text or voice. Powered by Meta Cloud API with full voice call support.

Meta Cloud APIVoice + Text

Live

Reach users on the fastest-growing messenger.

Connect a Telegram bot in seconds. Your AI agent handles text and voice messages with the same intelligence — inline replies, voice notes, and seamless human escalation.

Bot APIVoice Notes

●All channels converge to the same LiveKit room → same agent worker — one codebase, consistent experience everywhere

Dashboard

Your entire call operation. One screen.

Real-time contact timeline, full history, AI briefs, and live escalation queue — the command center for your support team.

Live product preview

lockapp.scandeer.ai/conversations

Conversations

searchSearch contacts…

Raj Kumar2m

SupportBot· Live

Sarah A.1h

SalesBot

Mike J.3h

SupportBot

Priya L.1d

SalesBot

Raj Kumar

3 calls

Active call

phone_in_talk

Call in progress · 2m 14s

SupportBot · Escalated

Call history · 3

expand_more

Today · 2:41 PM · 4m 12s

SupportBot — order enquiry → escalated

Escalated

person

Hi, I placed an order yesterday but haven't received a confirmation email.

smart_toy

I'd be happy to help with that! Could you share the email address used for the order?

person

It's [email protected]

smart_toy

I found your order #ORD-2891. The confirmation was sent — please check your spam folder. Is there anything else I can help with?

person

Actually I'd like to speak to a human agent please.

smart_toy

Of course! Connecting you to a support specialist now. Please hold for a moment.

chevron_right

Yesterday · 11:05 AM · 2m 53s

SupportBot — shipping update

Completed

chevron_right

Mar 6 · 4:22 PM · 1m 38s

SalesBot — product inquiry

Ended

How it works

From idea to live agent in minutes.

Describe your business to the AI Configurator

Open the chat panel and tell the AI what your agent should do. It creates the agent, attaches knowledge bases, configures tools, sets up channels — all through natural language. 18 tools, zero forms.

Connect tools and channels

Pick from 1,200+ MCP integrations or 400+ n8n workflows. Enable Web widget, Phone, WhatsApp, or Telegram — each channel routes to the same agent.

Deploy with one line of code

Drop a single embed into your website. Or buy a phone number. Or link your WhatsApp Business account. Customers reach you in under 800ms.

Multilingual

Your agent speaks their language.

17 languages across 3 model tiers. Purpose-built Indic STT/TTS models deliver native-quality Hindi, Telugu, Tamil, and Bengali accents — not robotic translations. Every Indian language gets a dedicated model pipeline, not generic Whisper fallbacks.

STTParakeet (EN) · IndicWhisper (Hindi, Bengali, Urdu) · Vakyansh (Telugu, Tamil, Malayalam) · Whisper large-v3 (all others)

LLMLlama 3.3 70B (most) · Qwen 2.5 72B (CJK) · Or BYOK: GPT-4o, Claude, Gemini — system prompt language injection

TTSCosyVoice2 (EN/ZH/JA) · IndicTTS IIT Madras (Dravidian) · IndicF5 (Indic multilingual) · XTTS v2 (European) · MeloTTS (fast multilingual) · Or BYOK: OpenAI TTS

Tier A — Base pricing

🇺🇸English

🇨🇳Chinese

🇮🇩Indonesian

Tier B — +20%

🇮🇳Hindi

🇧🇩Bengali

🇵🇰Urdu

🇫🇷French

🇩🇪German

🇪🇸Spanish

🇧🇷Portuguese

🇷🇺Russian

Tier C — +40% (specialist models)

🇮🇳Telugu

🇮🇳Tamil

🇮🇳Malayalam

🇸🇦Arabic

🇯🇵Japanese

🇰🇷Korean

🇮🇳

Indian languages use dedicated Indic models — not English models forced into Hindi. Native accent, natural prosody, regional pronunciation.

BYOK providers (OpenAI, Anthropic, Google, Groq, Inworld AI, OpenRouter) use base compute cost for all tiers. Mix providers for best quality per language.

Platform

Built for production at scale.

Every layer engineered to minimise latency, maximise accuracy, and eliminate SaaS dependency.

Voice Pipeline

Sub-800ms. Indistinguishable from human.

Parakeet TDT + Llama 3.3 70B + CosyVoice2 — fully self-hosted, fully streaming. VAD barge-in under 200ms. No SaaS latency taxes.

Learn more →

Smart IVR

Voice IVR that understands intent.

No keypad menus. Caller describes what they need — AI routes to Sales, Support, or any department, or handles it directly. PSTN/SIP bridge turns any phone number into an AI-powered room.

Learn more →

The world's #1 messaging channel.

Connect your WhatsApp Business Account. Text and voice messages handled by the same AI agent. Dominant channel in India, SE Asia, LATAM.

Learn more →

Workflows

Build a workflow. Assign it to an agent. Done.

Create automations in n8n, assign them per agent, and let VOX extract parameters from conversation to fire them — CRM, calendar, ticketing, payments — mid-call, automatically.

Learn more →

Knowledge Base

Your agent answers only from what you teach it.

Tag documents by agent or department. Contextual Retrieval (49% more accurate than naive RAG) injects only relevant chunks per query — strictly from your content, zero hallucinations outside it.

Learn more →

Multi-Tenant

One platform. Unlimited businesses.

Isolated agents, API keys, knowledge bases, analytics and billing per tenant. Team management with roles (admin/agent/viewer). Serve hundreds on one deployment.

Learn more →

Self-Hosted

Your data. Your infra. Your margin.

~$0.015/min platform-managed GPU vs $0.07–0.10/min for Retell or Vapi. No per-minute SaaS fees. Self-hosted GPU workers with real-time health monitoring.

Learn more →

BYOK

Bring your own AI provider. Pay at cost.

Use OpenAI, Anthropic, Google, Groq, Inworld AI (200+ models), or OpenRouter (400+ models) — or mix STT/LLM/TTS across providers. Your API key, your bill. No markup.

Learn more →

AI Providers

12+ STT, LLM, and TTS models. Mix and match.

Platform-managed GPU models for lowest cost, or BYOK from OpenAI, Groq, Google, Anthropic, Inworld AI, and OpenRouter. Pick STT, LLM, and TTS independently per agent.

Learn more →

Conversations

Every caller. Every call. Full context.

Contact-centric history panel — like WhatsApp Web for voice calls. Live active calls with elapsed timers, inline transcripts, copy-to-clipboard, full-text search. Real-time via WebSocket.

Learn more →

Live Escalation

AI tries first. Your team steps in when it matters.

AI detects when to hand off. Call queues to the right team in real time with full transcript context and ringback tone. One click for your agent to take over — zero hold music, zero cold start.

Learn more →

AI Configurator

Describe your business. We build the agent.

Tell the AI what you need in plain English. It creates agents, knowledge bases, toolboxes, tags, channels, and groups — all through natural language. No forms, no code, no guesswork.

Learn more →

Content Tags

Tag everything. Route intelligently.

Assign color-coded tags to documents, workflows, and tools. Per-turn filtering ensures agents only access content relevant to each conversation — no leaking across departments.

Learn more →

Smart Router

The right model for every turn.

Multi-model pool with heuristic routing. Fast models handle simple queries, powerful models tackle complex ones. Tier-based fallback, health tracking, and tag-driven routing hints — automatic cost optimization.

Learn more →

Observability

See every millisecond of every call.

Visual inference trace shows exactly what happened: STT timing, LLM reasoning, TTS synthesis, tool calls — all in a waterfall timeline. Debug latency, understand routing decisions, and optimize per-turn performance.

Learn more →

Voice pipeline — open-source · self-hosted · GPU-accelerated

LiveKit WebRTC

Real-time audio transport

Silero VAD

Barge-in · 85ms

Parakeet TDT 0.6B

STT · 150ms

Llama 3.3 70B

LLM · 300ms first token

CosyVoice2 0.5B

TTS · 150ms

Total target latency: <800ms · Platform-managed GPU workers · RTX 4090 / A100

AI Providers

Pick your models. We wire the pipeline.

Every voice call flows through three AI stages — Speech-to-Text, Language Model, and Text-to-Speech. Choose a provider for each slot independently, or let our platform handle it all.

Platform Managed

Bring Your Own Key

Speech-to-Text

Caller speaks → text

Platform Managed

Parakeet TDT 0.6B

English — 150ms, streaming

IndicWhisper

Hindi, Bengali, Urdu

Vakyansh

Telugu, Tamil, Malayalam, Kannada

Bring Your Own Key

Groq Whisper v3 Turbo

Ultra-fast, free tier

OpenAI Whisper

Industry standard

Google Chirp

125+ languages

Inworld AI STT

200+ models, smart routing

Language Model

Understands → reasons → responds

Platform Managed

Llama 3.3 70B INT8

Default — 300ms first token

Qwen 2.5 72B

Best for CJK languages

Bring Your Own Key

GPT-4o / GPT-4o-mini

OpenAI — versatile

Claude Sonnet / Opus

Anthropic — best reasoning

Gemini Pro / Flash

Google — multimodal

Groq Llama / Mixtral

LPU — fastest inference

OpenRouter 400+ models

Any model, one API

Text-to-Speech

Text → natural voice

Platform Managed

CosyVoice2 0.5B

EN, ZH, JA — 150ms streaming

IndicF5

Hindi, Telugu, Tamil + 7 Indic

IndicTTS (IIT Madras)

Dravidian languages

XTTS v2

European languages

MeloTTS

Fast multilingual fallback

Bring Your Own Key

OpenAI TTS-1 / TTS-1-HD

6 voices, natural

Google Chirp3 HD

8 voices per language

Groq Orpheus

English + Arabic, expressive

Inworld AI TTS

Low latency, multi-voice

Mix freely — e.g. Groq STT + Claude LLM + OpenAI TTS. Each slot is independent per agent.

See all providers and models →

Bring Your Own Key

Your models. Your choice.

Use our platform-managed GPU models at $0.015/min — or bring your own API keys from OpenAI, Anthropic, Google, Groq, Inworld AI, or OpenRouter. Mix and match STT, LLM, and TTS from different providers. Pay only for what you use.

graphic_eq

VOX Platform

Self-hosted open-source models. Lowest cost.

bolt

Groq

LPU inference. Ultra-fast Llama & Whisper.

graphic_eq

Inworld AI

Top TTS + STT + LLM router. One key.

route

OpenRouter

400+ LLM models. OpenAI-compatible.

auto_awesome

OpenAI

GPT-4o, Whisper, TTS-1. Industry standard.

cloud

Google

Gemini Pro & Flash. Multimodal.

smart_toy

Anthropic

Claude Opus, Sonnet, Haiku. Best reasoning.

tune

Custom Mix

Mix any STT + LLM + TTS provider.

How BYOK works

Add your API key

Paste your OpenAI, Anthropic, Google, Groq, Inworld AI, or OpenRouter API key in agent settings. Keys are encrypted and never leave your tenant.

Pick your models

Choose STT, LLM, and TTS independently — or use a preset card that auto-configures the best combination per provider.

Pay per use, at cost

Your API key, your bill. No VOX markup on BYOK usage. Platform fee covers orchestration, analytics, and channels only.

tune

Custom Mix — full flexibility

Combine Groq Whisper for STT, Claude for reasoning, and OpenAI TTS for voice — or any other combination. Each model slot is independently configurable.

Cost comparison per minute

ProviderCost/minNote

VOX Platform (self-hosted)~$0.015/minOpen-source models on your GPU

Groq (BYOK)~$0.02/minLPU-accelerated, pay Groq directly

Inworld AI (BYOK)~$0.03/min200+ models, smart routing

OpenRouter (BYOK)~$0.03/min400+ models, OpenAI-compatible

OpenAI (BYOK)~$0.04/minGPT-4o-mini + Whisper + TTS-1

Anthropic + OpenAI (BYOK)~$0.05/minClaude reasoning + OpenAI STT/TTS

Retell AI$0.07/minSaaS competitor

Vapi$0.05–0.10/minSaaS competitor

BYOK costs are estimates based on typical conversation length (~30s user speech, ~45s agent speech per minute). Actual cost depends on token usage and provider pricing.

Workflow Intelligence

Your agent doesn't just talk. It acts.

Book appointments, process orders, look up accounts, create tickets — your AI agent executes real business actions mid-conversation using n8n workflows and MCP tools. Fully secured within your organization.

How it works — from voice to action

Customer speaks

"I'd like to book an appointment for Friday afternoon"

Agent understands intent

Extracts: action=book, date=Friday, time=afternoon. Asks follow-ups if needed.

Workflow executes

n8n webhook fires → checks calendar → books slot → sends confirmation SMS

Agent confirms

"Done! You're booked for Friday at 2 PM. You'll get a confirmation text shortly."

n8n — Your private workflow engine

Every tenant gets a dedicated, sandboxed n8n instance running inside your infrastructure. Build multi-step automations visually — no code, no external dependencies, no data leaving your network.

400+

Pre-built integrations

Per-agent

Workflow assignment

AES-256

Credential encryption

Data leaves your infra

MCP — Direct tool execution

Model Context Protocol gives your agent instant access to 1,200+ tool integrations. Sub-200ms execution. Self-hostable. Call any CRM, database, API, or payment processor natively — inside a live voice conversation.

1,200+

Tool integrations

<200ms

Execution latency

Self-hosted

MCP servers

Open

Standard protocol

What your agents can do

Book appointments

Agent asks for date/time preferences, checks availability via n8n → Google Calendar, books the slot, sends SMS confirmation.

Process orders

Customer orders by voice. Agent extracts items, confirms details, triggers n8n → POS/ERP workflow, sends order confirmation.

Account enquiries

Agent verifies identity, queries CRM/database via MCP, reads back balance, transaction history, or policy details — securely.

Create support tickets

Agent captures the issue, creates a Zendesk/Freshdesk ticket with full transcript via n8n, emails the customer a ticket number.

Process payments

Agent collects payment intent, triggers Stripe/Razorpay workflow via n8n, confirms payment status back to the customer.

Track shipments

Customer asks about delivery. Agent queries logistics API via MCP, provides real-time tracking status and ETA.

All workflows run within your infrastructure. Credentials are encrypted with AES-256-GCM. No customer data leaves your network. No third-party SaaS touches your API keys.

See how workflows power your agents →

Integrations — connect VOX to any backend

1,200+

MCP integrations

The open standard for AI tool calls. Sub-200ms. Self-hostable. VOX calls any MCP server natively — CRM, ticketing, calendar, database, payments — inside a live voice conversation.

400+

n8n workflow templates

Complex multi-step workflows with visual editing and 400+ pre-built integrations. VOX calls your n8n webhooks with structured payloads extracted from conversation — you build the logic, VOX triggers it.

Plus direct REST API calls, PostgreSQL, MongoDB, Qdrant vector search, Typesense geo+structured search — VOX connects to anything.

FAQ

Common questions answered.

Free to start · No credit card required

Ready to deploy your first voice agent?

Web widget, phone, or WhatsApp — all channels, 17 languages. Platform GPU models from $0.015/min or BYOK with 7+ AI providers including OpenAI, Claude, Gemini, Groq, Inworld AI, and OpenRouter. No SaaS lock-in.

Create Free Account Sign In

Chat Demo Channel

AI voice agents for every customer channel.

See VOX In Action

Meet customers wherever they are.

Web Widget

Phone / AI IVR

WhatsApp

Telegram

Your entire call operation. One screen.

Conversations

From idea to live agent in minutes.

Describe your business to the AI Configurator

Connect tools and channels

Deploy with one line of code

Your agent speaks their language.

Built for production at scale.

Sub-800ms. Indistinguishable from human.

Voice IVR that understands intent.

The world's #1 messaging channel.

Build a workflow. Assign it to an agent. Done.

Your agent answers only from what you teach it.

One platform. Unlimited businesses.

Your data. Your infra. Your margin.

Bring your own AI provider. Pay at cost.

12+ STT, LLM, and TTS models. Mix and match.

Every caller. Every call. Full context.

AI tries first. Your team steps in when it matters.

Describe your business. We build the agent.

Tag everything. Route intelligently.

The right model for every turn.

See every millisecond of every call.

Pick your models. We wire the pipeline.

Speech-to-Text

Language Model

Text-to-Speech

Your models. Your choice.

How BYOK works

Cost comparison per minute

Your agent doesn't just talk. It acts.

n8n — Your private workflow engine

MCP — Direct tool execution

Book appointments

Process orders

Account enquiries

Create support tickets

Process payments

Track shipments

Common questions answered.

Ready to deploy your first voice agent?

AI voice agents for
every customer channel.