/·Back to overview
/reasoning-engine-trigger

It's not just a score.
It's a reasoning engine trigger.

You saw how simple the integration is. Now look at what actually comes back. The MCP response is a wide JSON surface: not only overall and pron.* sub-scores, but fluency (WPM, pauses), audio_quality (SNR, clip, level), and a details[] array where each word or character carries millisecond windows, dp_type, stress, liaison, phonemes[] with IPA, plus Mandarin toneobjects and confidence distributions. That density is what lets an LLM do secondary diagnosis and tertiary profiling — not a one-number API.

Dense metadata · one response

Structured for LLM reasoning, not a leaderboard cell

en + zh code paths
  • Session + audio QA
    overall · refText / session id · audio_quality: snr, clip, volume (UGC & mic checks)
  • pron + fluency blocks
    accuracy, integrity, fluency, rhythm; tone row for Chinese; WPM, pause count, broader fluency
  • details[] entries
    per word or 汉字: start/end ms, dp_type, stress, liaison, char-level tone + confidence[], phonemes[] with IPA & scores
  • Error hooks
    phoneme_error, omissions, affricate quality — the signals agents turn into feedback without custom DSP
01
Pass 1

Assess

Audio in → structured scores out

Chivox MCP
{
  "overall": 84,
  "pron": { "accuracy": 82, "integrity": 95, "fluency": 88, "rhythm": 79 },
  "fluency": { "overall": 85, "pause": 3, "speed": 128 },
  "audio_quality": { "snr": 24.1, "clip": 0, "volume": 2402 },
  "details": [
    {
      "word": "gorgeous",
      "score": 71, "dp_type": "mispron",
      "start": 420, "end": 980,
      "stress": { "ref": 1, "score": 62 },
      "liaison": "none",
      "phonemes": [
        { "ipa": "ɡ", "score": 92, "dp_type": "normal" },
        { "ipa": "ɔː", "score": 64, "dp_type": "mispron" }
      ]
    }
  ]
}
inaudio_file_pathoutscores.json
02
Pass 2

Diagnose

Scores → teacher-style feedback

Your LLM · pass 1

Fluency (88)is strong — good rhythm and chunking.

The weak spot is /ɔː/ in gorgeous: lips aren't rounded enough, coming out closer to /ɒ/.

Also review /dʒ/affricate — stop-to-fricative transition is too soft.

inscores.jsonoutdiagnosis.md
03
Pass 3

Drill

Diagnosis → personalized practice

Your LLM · pass 2
/ɔː/ minimal pairs
caught · cot · bought · pot
Shadow read · 2×
“The gorgeous storm poured all morning.”
3 tasks·~90s·targets 2 phonemes
indiagnosis.mdoutpractice.json
Live demoWatch pass ② run. A Mandarin payload in — a textbook-grade diagnosis out, streamed by o1-mini.
How an LLM reasons over a Chivox payload中文 · 你好 / 上海
INYou send — Chivox payload + 1-line promptdiagnose.ts
// pass 2 — feed the phonetic matrix to your LLM
const diag = await openai.chat.completions.create({
  model: "o1-mini",
  messages: [{
    role: "system",
    content:
      "You are a Mandarin pronunciation coach. " +
      "Given the Chivox MCP assessment payload, identify " +
      "the learner's 3 most impactful issues. Be concrete."
  }, {
    role: "user",
    content: JSON.stringify(assessment)
    // ↓ the payload Chivox MCP just returned (same wide schema as
    // English: pron, fluency, audio_quality, details[]; zh adds tone maps)
    // { "pron":{...,"tone":76}, "details":[
    //     { "char":"上","pinyin":"shang4","tone":{"ref":4,"detected":3,"score":58,"confidence":[...]}}
    // ] }
  }]
});
OUTLLM writes — diagnosis (o1-mini)thinking…
The same payload plugs into o1, claude-3.5-sonnet, gemini-2.0-pro, qwen-max, deepseek-v3 — any model that reads JSON.
  • Secondary · Pattern mining
    Agent surfaces session-level regularities: "unvoiced consonants failing 3 sessions in a row." No rule engine — pure LLM reasoning over dense data.
  • Tertiary · Student profiling
    Stack sessions in any vector DB or row store. Your agent plots learning curves and predicts next-exam CEFR / HSK band.
  • Combo · Diagnose + prescribe
    Chain Chivox MCP with O1 / Sonnet 3.5 / Gemini 2 for a world-class diagnosis-to-prescription loop, out of the box.
/contact

Let’s build your voice agent together.

Tell us what you’re building. We’ll reply within one business day with pilot credits, pricing, or a deployment plan — whichever you need first.

  • Enterprise pricing & self-hosted deployments
    Volume tiers, VPC install, SLAs, and on-prem engines for regulated buyers.
  • Missing a language or dialect?
    We train new acoustic models on request. Send us your target accent.
  • Pilot credits for evaluation teams
    Free benchmark run on your own audio, with a side-by-side report.
/get-in-touch

Tell us what you’re building.

By submitting this form you agree to receive a reply from the Chivox MCP team. We don’t share your email with third parties.