/·Back to overview

/demo · live walkthrough

A one-minute tour of what your LLM actually gets.

No recording required. Press play on either scenario below and watch the MCP payload, the LLM diagnosis, and the targeted drill your agent can generate — all from the same response. Mandarin is first: it's the moat most developers underestimate.

locale: zh-CN·stage 01/04·auto-advancing

Scenario 01 · tonal hard-mode

The learner tries to order dumplings.

They say “shuǐ jiǎo” (水饺, dumplings). A generic STT mishears it as “shuì jiào” (睡觉, to sleep) — same syllables, different tones, completely different meaning. Watch what MCP actually hears.

reference_text

我想吃水饺

wǒ xiǎng chī shuǐ jiǎo

“I want to eat dumplings.”

Generic STT heard

我想睡觉

“I want to sleep.”

Chivox MCP heard

我想吃水饺

…plus pron, details[], and tone confidence — next four stages.

01
Input
Chivox MCP tool call
focus
learner audio16kHz · mono
00:00.002380 ms00:02.38
what your agent sends to MCP
await mcp.call("assess_speech", {
  language: "zh-CN",
  refText:  "我想吃水饺",
  audio:    "s3://sessions/mandarin.wav"
})
language:zh-CNcoreType:cn.sent.rawrubric:CEFR-aligned
02
MCP response
Phonetic matrix · ~40 fields
chivox · assess_speech · mandarin.json
{
  "overall": 76,
  "pron": {
    "accuracy": 78,
    "integrity": 100,
    "fluency": 84,
    "rhythm": 72,
    "tone": 68
  },
  "fluency": {
    "pauseCount": 1,
    "speed": 102
  },
  "audio_quality": {
    "snr": 23.4,
    "clip": 0,
    "volume": 2280
  },
  "details": [
    {
      "char": "我",
      "pinyin": "wǒ",
      "score": 92,
      "dp_type": "normal",
      "start": 0,
      "end": 280,
      "tone": {
        "ref": 3,
        "detected": 3,
        "confidence": [
          2,
          4,
          6,
          82,
          6
        ]
      }
    },
    {
      "char": "想",
      "pinyin": "xiǎng",
      "score": 85,
      "dp_type": "normal",
      "start": 280,
      "end": 700,
      "tone": {
        "ref": 3,
        "detected": 3,
        "confidence": [
          3,
          5,
          10,
          74,
          8
        ]
      }
    },
    {
      "char": "吃",
      "pinyin": "chī",
      "score": 88,
      "dp_type": "normal",
      "start": 700,
      "end": 1040,
      "tone": {
        "ref": 1,
        "detected": 1,
        "confidence": [
          2,
          80,
          8,
          6,
          4
        ]
      }
    }
  ]
}
76
overall
Weighted composite
accuracy · integrity · fluency · rhythm · tone
pron
  • accuracy78
  • integrity100
  • fluency84
  • rhythm72
  • tone68
audio_quality
  • snr23.4
  • clip0
  • volume2280
fluency
  • pauses1
  • speed102 chars/min
details[] — per- breakdown
92
OK
tone: ref T3 → detected T3match
0T1T2T3T4
0ms → 280ms
xiǎng
85
OK
tone: ref T3 → detected T3match
0T1T2T3T4
280ms → 700ms
chī
88
OK
tone: ref T1 → detected T1match
0T1T2T3T4
700ms → 1040ms
shuǐ
86
OK
tone: ref T3 → detected T2sandhi · ok
0T1T2T3T4
phonemes[]
ʂ88w8482

Detected as T2 (rising) — this is the T3+T3 → T2+T3 sandhi rule correctly applied. Citation is T3, but natural speech expects T2 here.

1040ms → 1620ms
jiǎo
40
wrong-tone
tone: ref T3 → detected T4mismatch
0T1T2T3T4
phonemes[]
82j86au78

Detected as T4 (falling) — this is the single error that flips the utterance to 睡觉 “sleep”.

1620ms → 2380ms
03
LLM diagnosis
Your model reads the matrix
prompt

You are a Mandarin pronunciation coach. Below is the Chivox MCP payload for one utterance. In 3 short bullets, name the single most important issue, the root cause, and one concrete correction.

model
o1 · claude-3.5-sonnet · gemini-2-pro
input
the MCP payload from stage 02 (5 details · 6 phonemes)
LLM output · teacher-modestreaming…

04
Auto-generated drill
Agent plans next-session practice

Tone contrast drill

drill
  • T3 dip — 饺 vs 觉30 s

    jiǎo · jiào · jiǎo · jiào — alternate six times. Feel the difference between the dip and the fall.

  • T3 + T3 sandhi45 s

    shuǐ jiǎo → shuí jiǎo — rising first, dipping second. Five repeats, then use in: "我 想 / 吃 / 水饺."

Meaning check

drill
  • Contextual minimal pairs60 s

    我想吃水饺 (dumplings) ↔ 我想睡觉 (sleep). Record both; re-run MCP; confirm tone scores flip for 饺/觉.

Ready to wire it up?

Same payload. Your agent. Your production loop.

Drop Chivox MCP into Cursor, Claude Desktop, or any agent SDK. One npxand you're reading the same JSON you just saw above.