Voice AI Evaluation

Full-duplex voice agents: how simultaneous speech changes voice AI

Deepesh Jayal

•June 2026•

10 min read

Full-duplex voice agents: how simultaneous speech changes voice AI

Most voice agents still talk in strict turns. You speak, they wait, they reply. Real conversations do not work that way. People interrupt, overlap, and murmur agreement mid-sentence. Full-duplex voice agents close that gap by listening and speaking at once. This guide explains what they are, how they differ from half-duplex, and why they are harder to test.

Evalgent cares about this because full-duplex moves the biggest risk into turn-taking. We return to that at the end. First, the fundamentals.

What is a full-duplex voice agent?

Full-duplex voice agent: a voice agent that can listen and speak simultaneously, modelling overlapping speech, interruptions, and backchannels rather than strictly alternating turns.

The term comes from telecommunications, where full-duplex) means both ends can transmit at once. A phone call is full-duplex; a walkie-talkie is half-duplex. Applied to voice AI, a full-duplex agent processes incoming audio while it is still speaking, so it can react the instant a caller cuts in.

This is a shift in architecture, not just a feature. The agent has to listen, speak, and decide who holds the floor, all at the same time. Full-duplex voice ai treats the conversation as continuous, not as a sequence of discrete turns.

Full-duplex vs half-duplex voice agents

The contrast is sharp. Half-duplex agents are simpler and more predictable. Full-duplex agents are more natural but far more demanding.

Aspect	Half-duplex	Full-duplex
Speaking and listening	One at a time	Simultaneous
Barge-in	Not supported well	Native
Feel	Walkie-talkie	Natural conversation
Turn-taking	Fixed alternation	Real-time arbitration
Complexity	Lower	Higher
Failure mode	Awkward pauses	Talking over the user

Half-duplex vs full-duplex is the core decision for conversational design. Half-duplex is easier to build and test, but it feels robotic. Full-duplex feels human, at the cost of much harder engineering. For where this sits in the system, see our voice agent stack guide.

How does barge-in work in voice agents?

Barge-in is the headline capability of full-duplex. It lets a caller interrupt the agent mid-sentence, and the agent stops to listen. People do this constantly: they answer early, correct the system, or ask a new question before the agent finishes.

Mechanically, the agent keeps listening while it speaks. When it detects caller speech, it has to decide fast whether this is a real interruption or just a backchannel. If it is an interruption, the agent stops its own output and yields the floor. That decision is turn arbitration, and it has to happen within a couple of hundred milliseconds to feel right. A new caller to voice agents will recognise this from any natural phone call.

The four axes of full-duplex interaction

Full-duplex is not one capability but several working together. Research describes four core axes, and an agent has to handle all of them.

Pause handling: distinguishing a thinking pause from the end of a turn.
Turn-taking: deciding when to start speaking without cutting the caller off.
Backchanneling: producing or recognising "mm-hm" and "right" without taking the floor.
Interruption handling: stopping cleanly when the caller barges in.

These overlap in real calls. A caller pauses, the agent must decide whether to fill the silence; the caller murmurs agreement, the agent must not treat it as a new turn. Human turn-taking is fast and subtle, and matching it is the hard part of conversational AI.

Why latency matters more in full-duplex

In a half-duplex agent, latency shows up as an awkward gap before the reply. In a full-duplex agent, latency breaks the interaction model itself. If the agent is slow to detect an interruption, it keeps talking over the caller. If it is slow to yield, the overlap drags.

Full-duplex systems aim to transition between listening and speaking within roughly 100 to 300 milliseconds, matching human conversation. Hitting that requires tight control of VAD, endpointing, and the speech-to-speech path. Low latency stops being a nice-to-have and becomes the thing that makes full-duplex work at all. Real-time voice agents live or die on this number.

Why do voice agents talk over users?

Talking over the user is the signature failure of full-duplex. It happens when the agent misreads the floor. It treats a backchannel as silence and keeps going, or it fails to detect the interruption in time.

The root causes are familiar once you look. Aggressive endpointing makes the agent start too early. Weak interruption detection makes it ignore barge-in. High latency makes every decision late. Overlapping speech confuses the recognizer, so the agent cannot tell who is speaking. Our stress-testing guide covers how to surface these under load, where they appear most.

What is backchanneling in voice agents?

Backchanneling is the small feedback people give while someone else talks: "uh-huh", "right", "okay". It signals listening without claiming the turn. For a full-duplex agent, backchannels cut both ways.

Backchanneling: brief listener responses, such as "mm-hm", that acknowledge the speaker without taking the floor.

First, the agent must not mistake a caller's backchannel for an interruption, or it will stop unnecessarily. Second, a well-placed backchannel from the agent makes it feel attentive. Getting this wrong is jarring: an agent that halts every time the caller says "yeah" feels broken, and one that never acknowledges feels cold. Backchanneling is subtle, and it is easy to test badly.

Where full-duplex makes a difference

Full-duplex pays off where natural conversation matters. Sales and support calls feel better when callers can interrupt and be heard. A turn-taking voice agent that handles barge-in gracefully keeps people from repeating themselves. High-emotion calls benefit most, because frustrated callers interrupt more.

Simultaneous speech ai also helps where speed counts. In a barge-in voice agent for scheduling or account changes, letting the caller cut to the chase shortens calls. Conversational full-duplex is closest to how people already expect phones to work, so the learning curve for callers is effectively zero.

Half-duplex is still useful

Full-duplex is not always the right call. Half-duplex is simpler, cheaper, and easier to test. For IVR-style flows, structured data capture, or noisy lines where overlapping speech is unreliable, strict turns can be safer. Duplex audio handling adds cost and complexity that some use cases simply do not need.

The honest framing is a spectrum, not a binary. Many production agents are partially duplex: they support barge-in but otherwise take turns. Choose the level of duplex behaviour your calls actually require, then test that behaviour directly rather than assuming it works.

Building a full-duplex voice agent

Building full-duplex is mostly about the listening loop. The agent must process incoming audio continuously, not in turn-sized chunks. That means streaming speech-to-text, a fast interruption detector, and a speech path that can be cut off cleanly mid-utterance.

Three components carry most of the weight. Voice activity detection decides when the caller is speaking. Turn arbitration decides who holds the floor. The speech-to-speech path has to stop output the moment an interruption is confirmed. Tune these together, because a great recognizer with slow arbitration still talks over people. None of this removes the language model. It wraps the model in a real-time loop that manages who speaks when.

How to test a full-duplex voice agent

Full-duplex moves the hardest behaviour into turn-taking, and turn-taking is invisible to text-based testing. You cannot catch a talk-over by reading a transcript. You have to test over real audio, with overlapping speech and interruptions built in.

This is where Evalgent fits. Evalgent runs realistic conversations against your agent and exercises the turn-taking surface directly. Scenarios include interruptions, backchannels, and pauses. Profiles vary caller pace, accent, and how aggressively they barge in. Metrics measure interruption recovery and talk-over rate with custom thresholds. Evaluations run these as automated batches of synthetic callers, and Reviews let your team inspect any failed exchange with the audio, so you can hear the overlap, not just read it.

The result is turn-taking you can measure before real callers test it for you. For the full discipline, see the ai voice agent testing pillar and the synthetic callers guide. Natural conversation is the promise of full-duplex; testing is how you keep it.

Frequently asked questions

What is a full-duplex voice agent?

A full-duplex voice agent can listen and speak at the same time, modelling overlapping speech, interruptions, and backchannels instead of strictly taking turns. The name comes from telecommunications, where full-duplex means both ends transmit at once. This lets the agent react the instant a caller interrupts, making conversations feel natural rather than walkie-talkie style.

Full-duplex vs half-duplex voice agents: what is the difference?

Half-duplex voice agents speak and listen one at a time, like a walkie-talkie, so they handle interruptions poorly and can feel robotic. Full-duplex voice agents do both at once, supporting natural barge-in and backchannels. Full-duplex feels more human but is far harder to build and test, because turn-taking must be arbitrated in real time.

How does barge-in work in voice agents?

Barge-in lets a caller interrupt the agent mid-sentence, and the agent stops to listen. The agent keeps processing incoming audio while it speaks. When it detects caller speech, it decides whether this is a real interruption or a backchannel, and if it is an interruption, it yields the floor. This decision must happen within a couple hundred milliseconds.

Why do voice agents talk over users?

Voice agents talk over users when they misread the floor. Aggressive endpointing makes the agent start too early, weak interruption detection makes it ignore barge-in, and high latency makes every turn decision late. Overlapping speech can also confuse the recognizer. The result is the agent continuing to speak when it should have stopped and listened.

What latency does full-duplex voice need?

Full-duplex voice agents aim to transition between listening and speaking within roughly 100 to 300 milliseconds, matching human conversation. Beyond that range, interruptions feel laggy and the agent talks over callers. Hitting it requires tight control of voice activity detection, endpointing, and the speech-to-speech path, which is why low latency is essential rather than optional for full-duplex.

How do you test a full-duplex voice agent?

Test a full-duplex voice agent over real audio, with overlapping speech and interruptions built into the scenarios, because turn-taking is invisible to transcript-based testing. Drive calls that interrupt, pause, and backchannel, and measure interruption recovery and talk-over rate. Platform-agnostic testing with synthetic callers, such as Evalgent, exercises this turn-taking surface before real callers do.

Do full-duplex voice agents handle interruptions?

Yes, handling interruptions is the defining capability of full-duplex voice agents. They listen while speaking, so they can detect a barge-in and yield the floor in real time. The quality varies, though. A good agent stops cleanly within a couple hundred milliseconds, while a weak one talks over the caller or stops at the wrong moments, which is why interruption testing matters.

What is backchanneling in voice agents?

Backchanneling is the brief feedback a listener gives while the other person talks, such as "mm-hm" or "right", without taking the turn. For full-duplex voice agents it works two ways: the agent must not mistake a caller's backchannel for an interruption, and a well-timed backchannel from the agent signals attentiveness. Misreading backchannels makes an agent feel broken.

Conclusion

Full-duplex voice agents make conversations feel human by listening and speaking at once. That capability lives or dies on turn-taking: barge-in, backchannels, pauses, and the low latency that ties them together.

Turn-taking is also where full-duplex fails, and you cannot catch those failures in a transcript. Test over real audio with interruptions built in, because the talk-over you do not test is the one your callers will hear. Decide how much duplex behaviour your calls need, build for it, then prove it holds under real callers.

Why AI voice agents fail in production (and how to prevent it)

Voice AI Evaluation

8 min read

Why AI voice agents fail in production (and how to prevent it)

AI voice agents that ace demos still break in production. Learn the 5 root causes, how to test for each, and what production readiness actually means.

Voice agent regression testing: why LLM updates break production

Voice AI Evaluation

9 min read

Voice agent regression testing: why LLM updates break production

Updating your LLM improves benchmarks but breaks production voice agents in 5 predictable ways. How to test after every model update and prevent regressions.

Back to all articles

What is a full-duplex voice agent?

Full-duplex vs half-duplex voice agents

How does barge-in work in voice agents?

The four axes of full-duplex interaction

Why latency matters more in full-duplex

Why do voice agents talk over users?

What is backchanneling in voice agents?

Where full-duplex makes a difference

Half-duplex is still useful

Building a full-duplex voice agent

How to test a full-duplex voice agent

Frequently asked questions

What is a full-duplex voice agent?

Full-duplex vs half-duplex voice agents: what is the difference?

How does barge-in work in voice agents?

Why do voice agents talk over users?

What latency does full-duplex voice need?

How do you test a full-duplex voice agent?

Do full-duplex voice agents handle interruptions?

What is backchanneling in voice agents?

Conclusion

Related Articles

Why AI voice agents fail in production (and how to prevent it)

Voice agent regression testing: why LLM updates break production