liveThe listening layer for voice-native agents

Your agent can hear them.Now it can grade them.

Chivox MCP turns raw speech into a dense, agent-ready payload — phoneme scores, stress, tone, fluency, audio quality — all in one MCP call, any LLM. The listening layer under every voice-native agent you’re about to ship.

Deep linguistic understanding
Go beyond transcripts.
Enterprise-ready
Secure. Scalable. Reliable.
Real-time intelligence
React in the moment.
54321你好T3 ✓T3 ✓hǎoT3 + T3 → T2 + T3TONE SANDHI · DETECTEDSCORE92thinkPHONEMEACCURACYHEARD/s/WEAKDROPPED/s//θ//ɪ//ŋ//ŋ//k//k/3 ISSUES · DETECTEDMISSING · WEAK · MISPRONOUNCEDALL CORRECTEDPHONEME DIAGNOSIS · SUPERVISEDSCORE5892
60-second install·Copy → paste → your agent hears.
terminal · zshready
Works with Claude Desktop, Cursor, Cline& any MCP client.
chivox · phoneme.diagnose
Beyond STT
word · think/θɪŋk/
/θ/
35%
heard /s/
/ɪ/
92%
 
/ŋ/
54%
weak release
/k/
dropped
STRESSsyllable 1 · ok
LIAISONn/a · ok
INTONATION↘ falling · ok

60+ phonemes scored for accuracy, stress and intonation — not a single opaque number.

chivox · assess.mandarin
Hardest acoustic signal
pitch trace · F0 contour
locked to tone
水饺
shuǐ jiǎo= dumplings
target tones
T3 + T3
→ sandhi: T2 + T3
producedcitationF0 · 5-level Chao
generic STTheard 睡觉 (shuì jiào · sleep)✕ tone miss
Catches tones, erhua, neutral tone & sandhi — the difference between (mom) and (horse).
chivox · agent.reasoning
Not a leaderboard cell
full matrix → agent reasoningnot one score
{
  "overall": 48,
  "pron": { "accuracy": 44, "integrity": 90, "fluency": 72, "rhythm": 65 },
  "fluency": { "pause": 2, "speed": 118 },
  "audio_quality": { "snr": 19.2, "clip": 0 },
  "details": [
    {
      "word": "think",
      "score": 48, "dp_type": "mispron",
      "start": 2400, "end": 2910,
      "liaison": "none",
      "phonemes": [
        { "ipa": "θ", "score": 35, "dp_type": "mispron" },
        { "ipa": "ɪ", "score": 88, "dp_type": "normal" }
      ],
      "phoneme_error": { "expected": "/θ/", "actual": "/s/" }
    }
  ]
}
Agent reply · auto-generated
o1 · Sonnet · Gemini

“I noticed you pronounced think as sink. Place your tongue between your teeth for the /θ/ sound. Try: “Thirty thirsty thinkers thought…”

03
If it resolves tonal sandhi, it resolves anything.
Tone, sandhi, erhua, retroflex — the acoustic edge cases generic STT flatlines on. Same payload shape as every other language.
/what-it-does

The listening layer, as four MCP tools

Twenty years of pronunciation-assessment R&D, exposed as a structured payload your LLM can reason over. Drop into LangChain, LlamaIndex, the OpenAI Agents SDK or any custom loop — skip the months of DSP work.

overall
84
accuracy
78
fluency
88
rhythm
73
/assess

Score a learner’s speech

Stream mic audio or post a file. Get overall / accuracy / integrity / fluency / rhythm scores, plus word and phoneme-level diagnostics.

overallaccuracyfluencyrhythmphoneme
zh-CNen-USone flag
你好nǐ hǎo
tones · pinyin
Hello/həˈloʊ/
stress · CEFR
/languages

Mandarin & English, natively

Tones, pinyin, neutral tone, erhua, tone sandhi for Chinese. Stress, rhythm, CEFR-aligned scoring for English. One flag switches between them.

zh-CNen-USpinyintonesCEFR
AI
Describe your hometown in three sentences.
user · 00:14
U
fluency 82content 76grammar 88accuracy 79rhythm 81
/converse

Score free-flow dialogue

Open-ended AI-talk evaluation returns 5-dimensional scores on fluency, content, grammar, accuracy and rhythm — ready for the next LLM turn.

AI-talkopen-question5-dimstreaming
personalized drill
/θ/ minimal pairs · think · sink · thank · sank
GPTClaudeGeminiQwen
/drill

Personalize the next practice

Feed the JSON straight to GPT / Claude / Gemini. Use the shipped prompt-skill to generate targeted drills for weak phonemes or tones.

GPTClaudeGeminiQwenDeepSeek
/quickstart

Production-ready in 3 steps

Watch it run. Paste config → server connects → your LLM calls a tool and gets structured scores back.

Grab an API key

Sign up, confirm your email, copy the key. Free trial credits included.

Get a key
02

Add one block to your MCP configrunning

Paste the snippet into Cursor, Claude Desktop, or your custom agent — pick a tab on the right.

03

Call a tool from your LLM

Hand your model the audio. It gets back nested JSON: pron sub-scores, fluency + WPM, audio SNR, and details[] with ms ranges, stress, liaison and per-phoneme rows.

API reference
Live playground · no micRun a real Mandarin + English demoWatch raw JSON → teacher diagnosis → auto-generated drill. No signup, no setup.Open the playground
$
npx -y @chivox/mcp
/payload-depthDepth proof

Mandarin is where the payload proves itself.
If it resolves tonal sandhi, it resolves anything.

English handles the long tail of L2 learners. Mandarin is the pressure test — four tones, sandhi, erhua, retroflex, the acoustic edge cases that kill generic STT. Both ship as pron.* / details[] on the same payload contract, with tone objects and per-phoneme windows for zh, stress and CEFR alignment for en. One integration, two acoustically opposite languages.

Mandarin depth
HSK 1-9lexical ladder covered5 tones+ sandhi + erhua resolved95%+agreement with human raters
Same payload shape, hardest signal.
same pron.* / details[] contract
Mandarin · tone accuracy

你好,今天天气……

nǐ hǎo, jīn tiān tiān qì
78/100
sentence score
T3
85
hǎo
T3
72
jīn
T1
88
tiān
T1
88
tiān
T1
58
T4
91
tonesT1T2T3T4
LLM hint · second (tiān)collapsed into T4. Keep the pitch high and steady — it’s a T1.
Code-switching · cross-lingual scoring

Score a heritage speaker mid-sentence as they flip between languages — “I told her 我下周回家and she was thrilled.” Returns separate EN / zh sub-scores plus a blended fluency index. Same payload contract, two languages interleaved.

Looking to ship a Mandarin coach on top of this? See the build-a-tutor use case
/use-cases

Built for what developers actually ship

Tutors, coaches, companions, QA tooling — pick the scenario that\u2019s yours and see how the agent loop looks in practice.

T3hǎoT3jīnT1
tone score · 88

The only MCP that feeds LLMs phoneme-level Mandarin

Tone objects, sandhi resolution and per-phoneme windows returned in the same payload shape every other language ships. Your agent reasons over 睡觉 vs. 水饺 at the acoustic layer, not the transcript — signal a Whisper-stack integration simply can’t surface.

0:06
Scoredoverall 84fluency 78
ai
Try it again, focus on /θ/
voice · live

Score candidate speech, not just transcripts

Screen English fluency, pronunciation confidence and rhythm at scale. Your LLM reasons over numbers, not vibes — explainable rubrics every HR team will trust.

retake
00:00·01:24 retake·02:48
AI QA · live

Agent training & call-script compliance

Evaluate standard-phrase delivery, articulation, pacing and keyword hits for call-center reps. Flag exactly which second drifted off-script and auto-generate coaching drills.

chivoxCursorClaudeClineLangChainZedDify
MCP · 1 config
+ LlamaIndex · OpenAI Agents SDK

Voice-gated NPCs and pronunciation-powered gameplay

Players unlock spells, dialogues or levels by saying the phrase correctly. Get a pass/fail plus the exact phoneme that missed, at <300 ms p95 — fast enough for real-time game loops.

/benchmarks

Speech scoring driven by research

The engine behind Chivox MCP is 20 years of R&D in pronunciation assessment. Here\u2019s how it holds up in production.

95%+
agreement with human experts
r ≈ 0.95

Scores align with certified human expert rubrics at 95%+ correlation. Validated by national standardized speaking tests in 100+ cities.

0B+
evaluations per year
0%+
correlation with human experts
0
countries & regions
0 yrs
in speech AI research
Validated by national testing centers for standardized speaking exams
14+ granted patents in speech assessment
99.99% uptime SLA for enterprise deployments
GDPR-friendly data handling for EU markets
Ready to wire it up?

Same payload. Your agent. Your production loop.

Drop Chivox MCP into Cursor, Claude Desktop, or any agent SDK. One npx and you’re reading the same JSON you just saw above.

Starter key free · spend caps · low-balance alerts · zero audio retention

/contact

Let’s build your voice agent together.

Tell us what you’re building. We’ll reply within one business day with pilot credits, pricing, or a deployment plan — whichever you need first.

  • Enterprise pricing & self-hosted deployments
    Volume tiers, VPC install, SLAs, and on-prem engines for regulated buyers.
  • Missing a language or dialect?
    We train new acoustic models on request. Send us your target accent.
  • Pilot credits for evaluation teams
    Free benchmark run on your own audio, with a side-by-side report.
/get-in-touch

Tell us what you’re building.

By submitting this form you agree to receive a reply from the Chivox MCP team. We don’t share your email with third parties.

ChivoxMCP
Stay connected with us

Product-update sign-up: enter your work email and tap the arrow — your mail app opens with a short note to BD@chivox.com so we can add you to the list. No spam. Unsubscribe anytime by replying.

Built by speech scientists. Trusted by 10k+ voice-AI builders.©2026 Chivox Inc. All rights reserved.