Chivox MCP | Speech Assessment

/reasoning-engine-trigger

It's not just a score.
It's a reasoning engine trigger.

You saw how simple the integration is. Now look at what actually comes back. The MCP response is a wide JSON surface: not only overall and pron.* sub-scores, but fluency (WPM, pauses), audio_quality (SNR, clip, level), and a details[] array where each word or character carries millisecond windows, dp_type, stress, liaison, phonemes[] with IPA, plus Mandarin toneobjects and confidence distributions. That density is what lets an LLM do secondary diagnosis and tertiary profiling — not a one-number API.

Dense metadata · one response

Structured for LLM reasoning, not a leaderboard cell

en + zh code paths

Session + audio QA
overall · refText / session id · audio_quality: snr, clip, volume (UGC & mic checks)
pron + fluency blocks
accuracy, integrity, fluency, rhythm; tone row for Chinese; WPM, pause count, broader fluency
details[] entries
per word or 汉字: start/end ms, dp_type, stress, liaison, char-level tone + confidence[], phonemes[] with IPA & scores
Error hooks
phoneme_error, omissions, affricate quality — the signals agents turn into feedback without custom DSP

Pass 1

Assess

Audio in → structured scores out

Chivox MCP

{
  "overall": 84,
  "pron": { "accuracy": 82, "integrity": 95, "fluency": 88, "rhythm": 79 },
  "fluency": { "overall": 85, "pause": 3, "speed": 128 },
  "audio_quality": { "snr": 24.1, "clip": 0, "volume": 2402 },
  "details": [
    {
      "word": "gorgeous",
      "score": 71, "dp_type": "mispron",
      "start": 420, "end": 980,
      "stress": { "ref": 1, "score": 62 },
      "liaison": "none",
      "phonemes": [
        { "ipa": "ɡ", "score": 92, "dp_type": "normal" },
        { "ipa": "ɔː", "score": 64, "dp_type": "mispron" }
      ]
    }
  ]
}

inaudio_file_pathoutscores.json

Pass 2

Diagnose

Scores → teacher-style feedback

Your LLM · pass 1

Fluency (88)is strong — good rhythm and chunking.

The weak spot is /ɔː/ in gorgeous: lips aren't rounded enough, coming out closer to /ɒ/.

Also review /dʒ/affricate — stop-to-fricative transition is too soft.

inscores.jsonoutdiagnosis.md

Pass 3

Drill

Diagnosis → personalized practice

Your LLM · pass 2

/ɔː/ minimal pairs

caught · cot · bought · pot

Shadow read · 2×

“The gorgeous storm poured all morning.”

3 tasks·~90s·targets 2 phonemes

indiagnosis.mdoutpractice.json

Live demoWatch pass ② run. A Mandarin payload in — a textbook-grade diagnosis out, streamed by o1-mini.

How an LLM reasons over a Chivox payload中文 · 你好 / 上海

INYou send — Chivox payload + 1-line promptdiagnose.ts

// pass 2 — feed the phonetic matrix to your LLM
const diag = await openai.chat.completions.create({
  model: "o1-mini",
  messages: [{
    role: "system",
    content:
      "You are a Mandarin pronunciation coach. " +
      "Given the Chivox MCP assessment payload, identify " +
      "the learner's 3 most impactful issues. Be concrete."
  }, {
    role: "user",
    content: JSON.stringify(assessment)
    // ↓ the payload Chivox MCP just returned (same wide schema as
    // English: pron, fluency, audio_quality, details[]; zh adds tone maps)
    // { "pron":{...,"tone":76}, "details":[
    //     { "char":"上","pinyin":"shang4","tone":{"ref":4,"detected":3,"score":58,"confidence":[...]}}
    // ] }
  }]
});

OUTLLM writes — diagnosis (o1-mini)thinking…

The same payload plugs into o1, claude-3.5-sonnet, gemini-2.0-pro, qwen-max, deepseek-v3 — any model that reads JSON.

Secondary · Pattern mining
Agent surfaces session-level regularities: "unvoiced consonants failing 3 sessions in a row." No rule engine — pure LLM reasoning over dense data.
Tertiary · Student profiling
Stack sessions in any vector DB or row store. Your agent plots learning curves and predicts next-exam CEFR / HSK band.
Combo · Diagnose + prescribe
Chain Chivox MCP with O1 / Sonnet 3.5 / Gemini 2 for a world-class diagnosis-to-prescription loop, out of the box.

/contact

Let’s build your voice agent together.

Tell us what you’re building. We’ll reply within one business day with pilot credits, pricing, or a deployment plan — whichever you need first.

Enterprise pricing & self-hosted deployments
Volume tiers, VPC install, SLAs, and on-prem engines for regulated buyers.
Missing a language or dialect?
We train new acoustic models on request. Send us your target accent.
Pilot credits for evaluation teams
Free benchmark run on your own audio, with a side-by-side report.

Prefer plain email?

BD@chivox.com· developer & enterprise inquiries

It's not just a score.It's a reasoning engine trigger.

Assess

Diagnose

Drill

Let’s build your voice agent together.

Tell us what you’re building.

It's not just a score.
It's a reasoning engine trigger.