liveThe listening layer for voice-native agents

Your agent can hear them.Now it can grade them.

Chivox MCP turns raw speech into a dense, agent-ready payload — phoneme scores, stress, tone, fluency, audio quality — all in one MCP call, any LLM. The listening layer under every voice-native agent you’re about to ship.

Start free See it run

Deep linguistic understanding

Go beyond transcripts.

Enterprise-ready

Secure. Scalable. Reliable.

Real-time intelligence

React in the moment.

60-second install·Copy → paste → your agent hears.

terminal · zshready

Works with Claude Desktop, Cursor, Cline& any MCP client.

chivox · phoneme.diagnose

Beyond STT

word · think/θɪŋk/

/θ/

35%

heard /s/

/ɪ/

92%

/ŋ/

54%

weak release

/k/

—

dropped

STRESSsyllable 1 · ok

LIAISONn/a · ok

INTONATION↘ falling · ok

60+ phonemes scored for accuracy, stress and intonation — not a single opaque number.

chivox · assess.mandarin

Hardest acoustic signal

pitch trace · F0 contour

locked to tone

水饺

shuǐ jiǎo= dumplings

target tones

T3 + T3

→ sandhi: T2 + T3

producedcitationF0 · 5-level Chao

generic STTheard 睡觉 (shuì jiào · sleep)✕ tone miss

Catches tones, erhua, neutral tone & sandhi — the difference between mā (mom) and mǎ (horse).

chivox · agent.reasoning

Not a leaderboard cell

full matrix → agent reasoningnot one score

{
  "overall": 48,
  "pron": { "accuracy": 44, "integrity": 90, "fluency": 72, "rhythm": 65 },
  "fluency": { "pause": 2, "speed": 118 },
  "audio_quality": { "snr": 19.2, "clip": 0 },
  "details": [
    {
      "word": "think",
      "score": 48, "dp_type": "mispron",
      "start": 2400, "end": 2910,
      "liaison": "none",
      "phonemes": [
        { "ipa": "θ", "score": 35, "dp_type": "mispron" },
        { "ipa": "ɪ", "score": 88, "dp_type": "normal" }
      ],
      "phoneme_error": { "expected": "/θ/", "actual": "/s/" }
    }
  ]
}

Agent reply · auto-generated

o1 · Sonnet · Gemini

“I noticed you pronounced think as sink. Place your tongue between your teeth for the /θ/ sound. Try: “Thirty thirsty thinkers thought…””

03

If it resolves tonal sandhi, it resolves anything.

Tone, sandhi, erhua, retroflex — the acoustic edge cases generic STT flatlines on. Same payload shape as every other language.

/what-it-does

The listening layer, as four MCP tools

Twenty years of pronunciation-assessment R&D, exposed as a structured payload your LLM can reason over. Drop into LangChain, LlamaIndex, the OpenAI Agents SDK or any custom loop — skip the months of DSP work.

overall

84

accuracy

78

fluency

88

rhythm

73

/assess

Score a learner’s speech

Stream mic audio or post a file. Get overall / accuracy / integrity / fluency / rhythm scores, plus word and phoneme-level diagnostics.

overallaccuracyfluencyrhythmphoneme

zh-CNen-USone flag

你好nǐ hǎo

tones · pinyin

Hello/həˈloʊ/

stress · CEFR

/languages

Mandarin & English, natively

Tones, pinyin, neutral tone, erhua, tone sandhi for Chinese. Stress, rhythm, CEFR-aligned scoring for English. One flag switches between them.

zh-CNen-USpinyintonesCEFR

AI

Describe your hometown in three sentences.

user · 00:14

U

fluency 82content 76grammar 88accuracy 79rhythm 81

/converse

Score free-flow dialogue

Open-ended AI-talk evaluation returns 5-dimensional scores on fluency, content, grammar, accuracy and rhythm — ready for the next LLM turn.

AI-talkopen-question5-dimstreaming

personalized drill

/θ/ minimal pairs · think · sink · thank · sank

GPTClaudeGeminiQwen

/drill

Personalize the next practice

Feed the JSON straight to GPT / Claude / Gemini. Use the shipped prompt-skill to generate targeted drills for weak phonemes or tones.

GPTClaudeGeminiQwenDeepSeek

/quickstart

Production-ready in 3 steps

Watch it run. Paste config → server connects → your LLM calls a tool and gets structured scores back.

Full docs & API reference

Grab an API key

Sign up, confirm your email, copy the key. Free trial credits included.

Get a key

02

Add one block to your MCP configrunning

Paste the snippet into Cursor, Claude Desktop, or your custom agent — pick a tab on the right.

03

Call a tool from your LLM

Hand your model the audio. It gets back nested JSON: pron sub-scores, fluency + WPM, audio SNR, and details[] with ms ranges, stress, liaison and per-phoneme rows.

API reference

Live playground · no micRun a real Mandarin + English demoWatch raw JSON → teacher diagnosis → auto-generated drill. No signup, no setup.Open the playground

~/.cursor/mcp.json

$

npx -y @chivox/mcp

/payload-depthDepth proof

Mandarin is where the payload proves itself.
If it resolves tonal sandhi, it resolves anything.

English handles the long tail of L2 learners. Mandarin is the pressure test — four tones, sandhi, erhua, retroflex, the acoustic edge cases that kill generic STT. Both ship as pron.* / details[] on the same payload contract, with tone objects and per-phoneme windows for zh, stress and CEFR alignment for en. One integration, two acoustically opposite languages.

Mandarin depth

HSK 1-9lexical ladder covered5 tones+ sandhi + erhua resolved95%+agreement with human raters

Same payload shape, hardest signal.

same pron.* / details[] contractOther locales on request.

Mandarin · tone accuracy

你好，今天天气……

nǐ hǎo, jīn tiān tiān qì

78/100

sentence score

你

nǐ

T3

85

好

hǎo

T3

72

今

jīn

T1

88

天

tiān

T1

88

天

tiān

T1

58

气

qì

T4

91

tonesT1T2T3T4

LLM hint · second 天 (tiān)collapsed into T4. Keep the pitch high and steady — it’s a T1.

Code-switching · cross-lingual scoring

Score a heritage speaker mid-sentence as they flip between languages — “I told her 我下周回家and she was thrilled.” Returns separate EN / zh sub-scores plus a blended fluency index. Same payload contract, two languages interleaved.

Looking to ship a Mandarin coach on top of this? See the build-a-tutor use case

/use-cases

Built for what developers actually ship

Tutors, coaches, companions, QA tooling — pick the scenario that\u2019s yours and see how the agent loop looks in practice.

语

nǐT3hǎoT3jīnT1

tone score · 88

The only MCP that feeds LLMs phoneme-level Mandarin

Tone objects, sandhi resolution and per-phoneme windows returned in the same payload shape every other language ships. Your agent reasons over 睡觉 vs. 水饺 at the acoustic layer, not the transcript — signal a Whisper-stack integration simply can’t surface.

0:06

Scoredoverall 84fluency 78

ai

Try it again, focus on /θ/

voice · live

Score candidate speech, not just transcripts

Screen English fluency, pronunciation confidence and rhythm at scale. Your LLM reasons over numbers, not vibes — explainable rubrics every HR team will trust.

00:00·01:24 retake·02:48

AI QA · live

Agent training & call-script compliance

Evaluate standard-phrase delivery, articulation, pacing and keyword hits for call-center reps. Flag exactly which second drifted off-script and auto-generate coaching drills.

MCP · 1 config

+ LlamaIndex · OpenAI Agents SDK

Voice-gated NPCs and pronunciation-powered gameplay

Players unlock spells, dialogues or levels by saying the phrase correctly. Get a pass/fail plus the exact phoneme that missed, at <300 ms p95 — fast enough for real-time game loops.

/benchmarks

Speech scoring driven by research

The engine behind Chivox MCP is 20 years of R&D in pronunciation assessment. Here\u2019s how it holds up in production.

95%+

agreement with human experts

Scores align with certified human expert rubrics at 95%+ correlation. Validated by national standardized speaking tests in 100+ cities.

0B+

evaluations per year

0%+

correlation with human experts

0

countries & regions

0 yrs

in speech AI research

Validated by national testing centers for standardized speaking exams

14+ granted patents in speech assessment

99.99% uptime SLA for enterprise deployments

GDPR-friendly data handling for EU markets

Ready to wire it up?

Same payload. Your agent. Your production loop.

Drop Chivox MCP into Cursor, Claude Desktop, or any agent SDK. One npx and you’re reading the same JSON you just saw above.

Starter key free · spend caps · low-balance alerts · zero audio retention

See quickstart Read the docs Get your API key

/contact

Let’s build your voice agent together.

Tell us what you’re building. We’ll reply within one business day with pilot credits, pricing, or a deployment plan — whichever you need first.

Enterprise pricing & self-hosted deployments
Volume tiers, VPC install, SLAs, and on-prem engines for regulated buyers.
Missing a language or dialect?
We train new acoustic models on request. Send us your target accent.
Pilot credits for evaluation teams
Free benchmark run on your own audio, with a side-by-side report.

Prefer plain email?

BD@chivox.com· developer & enterprise inquiries