Define how you measure your voice agent
Set your own evaluation criteria — choose what to measure, how to score it, and what success looks like. Evalgent handles the rest.
Tone consistency
Rate how consistently the agent maintains a calm, empathetic tone
What is a voice agent metric?
A metric is an evaluation criterion that measures a specific quality of your voice agent — like tone, response speed, or knowledge accuracy. Metrics are not scenario-specific pass/fail checks. They score every conversation your agent handles, giving you a consistent measure of performance across all scenarios, runs, and versions.
Define
Set what quality to measure and how to score it
Apply
Metric is evaluated across every conversation
Track
Monitor trends across runs and versions
How do we ensure metrics coverage?
Telemetry metrics
Metrics automatically extracted from call metadata — no AI judgment needed. Objective, fast, and always consistent. Measure response latency, call duration, silence ratios, and more.
LLM-based metrics
AI evaluates conversation quality against your defined criteria. Define what to measure in natural language, choose a scoring type, and set a success threshold.
Already measuring internally? Good — bring those same metrics here.
Internal testing today
- You measure metrics on real production calls — after they happen
- Or on scripted test calls that don't reflect real-world conditions
- No consistent simulation environment
- Metrics exist, but the test conditions don't match reality
With Evalgent
- Same metrics, but evaluated inside realistic simulated calls
- Synthetic callers replicate real accents, noise, interruptions, and conversational chaos
- Your agent is tested under production-like conditions — before it reaches production
- Metrics stay in sync between your internal system and your evaluation environment
"The gap isn't in the metrics — it's in the environment you test them in. Most teams measure on clean, scripted calls. Evalgent measures on calls that behave like real ones."
The difference a well-defined metric makes
Poorly defined metric
- "Agent should sound professional" — too vague to score consistently
- No measurable threshold — different evaluators, different results
- Can't compare across versions — no baseline to track
- Results you can't act on
Properly defined metric
- "Rate how consistently the agent maintains a calm, empathetic tone" — specific and scoreable
- Clear scoring type (scale 1–5) with defined success threshold (≥ 4)
- Consistent results across all scenarios and runs
- Track quality trends version over version