/demo · live walkthrough
A one-minute tour of what your LLM actually gets.
No recording required. Press play on either scenario below and watch the MCP payload, the LLM diagnosis, and the targeted drill your agent can generate — all from the same response. Mandarin is first: it's the moat most developers underestimate.
Scenario 01 · tonal hard-mode
The learner tries to order dumplings.
They say “shuǐ jiǎo” (水饺, dumplings). A generic STT mishears it as “shuì jiào” (睡觉, to sleep) — same syllables, different tones, completely different meaning. Watch what MCP actually hears.
我想吃水饺
“I want to eat dumplings.”
我想睡觉
“I want to sleep.”
我想吃水饺
…plus pron, details[], and tone confidence — next four stages.
await mcp.call("assess_speech", {
language: "zh-CN",
refText: "我想吃水饺",
audio: "s3://sessions/mandarin.wav"
}){
"overall": 76,
"pron": {
"accuracy": 78,
"integrity": 100,
"fluency": 84,
"rhythm": 72,
"tone": 68
},
"fluency": {
"pauseCount": 1,
"speed": 102
},
"audio_quality": {
"snr": 23.4,
"clip": 0,
"volume": 2280
},
"details": [
{
"char": "我",
"pinyin": "wǒ",
"score": 92,
"dp_type": "normal",
"start": 0,
"end": 280,
"tone": {
"ref": 3,
"detected": 3,
"confidence": [
2,
4,
6,
82,
6
]
}
},
{
"char": "想",
"pinyin": "xiǎng",
"score": 85,
"dp_type": "normal",
"start": 280,
"end": 700,
"tone": {
"ref": 3,
"detected": 3,
"confidence": [
3,
5,
10,
74,
8
]
}
},
{
"char": "吃",
"pinyin": "chī",
"score": 88,
"dp_type": "normal",
"start": 700,
"end": 1040,
"tone": {
"ref": 1,
"detected": 1,
"confidence": [
2,
80,
8,
6,
4
]
}
}
]
}- accuracy78
- integrity100
- fluency84
- rhythm72
- tone68
- snr23.4
- clip0
- volume2280
- pauses1
- speed102 chars/min
Detected as T2 (rising) — this is the T3+T3 → T2+T3 sandhi rule correctly applied. Citation is T3, but natural speech expects T2 here.
Detected as T4 (falling) — this is the single error that flips the utterance to 睡觉 “sleep”.
You are a Mandarin pronunciation coach. Below is the Chivox MCP payload for one utterance. In 3 short bullets, name the single most important issue, the root cause, and one concrete correction.
Tone contrast drill
drill- T3 dip — 饺 vs 觉30 s
jiǎo · jiào · jiǎo · jiào — alternate six times. Feel the difference between the dip and the fall.
- T3 + T3 sandhi45 s
shuǐ jiǎo → shuí jiǎo — rising first, dipping second. Five repeats, then use in: "我 想 / 吃 / 水饺."
Meaning check
drill- Contextual minimal pairs60 s
我想吃水饺 (dumplings) ↔ 我想睡觉 (sleep). Record both; re-run MCP; confirm tone scores flip for 饺/觉.
Ready to wire it up?
Same payload. Your agent. Your production loop.
Drop Chivox MCP into Cursor, Claude Desktop, or any agent SDK. One npxand you're reading the same JSON you just saw above.