You saw how simple the integration is. Now look at what actually comes back. The MCP response is a wide JSON surface: not only overall and pron.* sub-scores, but fluency (WPM, pauses), audio_quality (SNR, clip, level), and a details[] array where each word or character carries millisecond windows, dp_type, stress, liaison, phonemes[] with IPA, plus Mandarin toneobjects and confidence distributions. That density is what lets an LLM do secondary diagnosis and tertiary profiling — not a one-number API.
Dense metadata · one response
Structured for LLM reasoning, not a leaderboard cell
Audio in → structured scores out
{
"overall": 84,
"pron": { "accuracy": 82, "integrity": 95, "fluency": 88, "rhythm": 79 },
"fluency": { "overall": 85, "pause": 3, "speed": 128 },
"audio_quality": { "snr": 24.1, "clip": 0, "volume": 2402 },
"details": [
{
"word": "gorgeous",
"score": 71, "dp_type": "mispron",
"start": 420, "end": 980,
"stress": { "ref": 1, "score": 62 },
"liaison": "none",
"phonemes": [
{ "ipa": "ɡ", "score": 92, "dp_type": "normal" },
{ "ipa": "ɔː", "score": 64, "dp_type": "mispron" }
]
}
]
}Scores → teacher-style feedback
Fluency (88)is strong — good rhythm and chunking.
The weak spot is /ɔː/ in gorgeous: lips aren't rounded enough, coming out closer to /ɒ/.
Also review /dʒ/affricate — stop-to-fricative transition is too soft.
Diagnosis → personalized practice
// pass 2 — feed the phonetic matrix to your LLM
const diag = await openai.chat.completions.create({
model: "o1-mini",
messages: [{
role: "system",
content:
"You are a Mandarin pronunciation coach. " +
"Given the Chivox MCP assessment payload, identify " +
"the learner's 3 most impactful issues. Be concrete."
}, {
role: "user",
content: JSON.stringify(assessment)
// ↓ the payload Chivox MCP just returned (same wide schema as
// English: pron, fluency, audio_quality, details[]; zh adds tone maps)
// { "pron":{...,"tone":76}, "details":[
// { "char":"上","pinyin":"shang4","tone":{"ref":4,"detected":3,"score":58,"confidence":[...]}}
// ] }
}]
});o1, claude-3.5-sonnet, gemini-2.0-pro, qwen-max, deepseek-v3 — any model that reads JSON.Tell us what you’re building. We’ll reply within one business day with pilot credits, pricing, or a deployment plan — whichever you need first.