Run structured evaluations on your voice agent
Combine scenarios, profiles, and metrics into a single evaluation. Know exactly where your agent passes, fails, and why.
Support agent v2.3 — regression test
Full evaluation across all support scenarios
What is a voice agent evaluation?
An evaluation is a structured test run. You select which scenarios to test, which caller profiles to use, and which metrics to measure. Evalgent runs every combination, scores each call, and gives you a clear pass/fail verdict per test.
Evaluations automate what manual QA cannot scale — running hundreds of test conversations in parallel and surfacing failures before they reach production.
How to set-up a campaign for voice agent evaluation?
Define your test matrix
Pick scenarios, profiles, and metrics to include in your evaluation campaign. Each combination becomes a test.
Scenarios
+ 9 more selected
Profiles
Metrics
Set success criteria
Configure run count and pass thresholds. Define how many runs per test and what SSR score counts as a pass.
Runs per test
3SSR pass threshold
≥ 70%Verdict logic
Review & launch
Confirm your configuration and launch the evaluation. Evalgent handles the rest — running every test and collecting results.
Support agent v2.3 — regression test
Full evaluation across all support scenarios
12
Scenarios
5
Profiles
3
Metrics
See exactly where your agent stands
Results matrix
See pass/fail rates across every scenario × profile combination
Evidences
Turn-level proof for every success condition — from transcripts, recordings, and scored outcomes
Recommendations
BetaReceive targeted suggestions to improve agent performance based on evaluation results
The difference structured evaluations make
Manual testing today
- Manual QA on a handful of calls
- No consistency across test conditions
- No way to compare versions objectively
- Results live in spreadsheets or Slack threads
Structured & automated
- Every scenario × profile combination tested automatically
- Consistent caller simulation with realistic conditions
- Version-over-version comparison with the same test matrix
- Results in one place — verdicts, scores, transcripts, audio