When a call event fails, most stacks quietly drop it. A dead letter queue catches the failure and lets you replay it. Here's the build.
When a call event fails, most stacks quietly drop it. A dead letter queue catches the failure and lets you replay it. Here's the build.
TL;DR
- A dead letter queue catches failed call events so your voice AI stack can replay them instead of silently dropping leads.
- Without one, a transient API timeout means the lead is gone. No alert. No retry. Just gone.
- If you're running a production voice agent for any service business, this is non-negotiable infrastructure.
A dead letter queue is where a failed call event lands when your automation can't process it. Instead of the event disappearing, it parks itself somewhere safe so you can inspect it and replay it.
What actually happens when a call event fails?
Most stacks don't fail loudly. They just drop the event and move on.
Your webhook fires. Retell AI sends the call completed payload. N8N receives it, tries to write to GHL, hits a timeout. What happens next depends entirely on whether you've got error handling wired up. If you haven't, the answer is nothing. The event's gone. The lead's gone. Nobody knows.
This isn't a hypothetical. Transient failures happen constantly in production. API rate limits, downstream CRM hiccups, temporary network drops. Any of them can swallow an event.

What does a dead letter queue actually do?
It catches the failed event before it disappears and routes it somewhere you control.
The concept comes from message queue systems but the principle applies to any async automation. When an event can't be processed successfully after a set number of retries, instead of discarding it, you push it to a holding queue. That queue holds the payload intact. You can inspect what failed, fix the underlying issue, and replay the event against the live system.
For a voice AI pipeline, that payload typically includes the call ID, the contact record, the outcome data, and whatever the agent captured during the conversation. All of that is recoverable. But only if you catch it.

How do you wire a dead letter queue into an N8N voice AI pipeline?
You add an error branch to every webhook handler, then write failures to a Postgres table or a dedicated queue node.
In N8N, the simplest implementation uses the Error Trigger node combined with a Postgres write. Every failed execution gets caught by the error trigger. You write the raw payload, the error message, the timestamp, and the workflow ID to a dead letter table. From there you've got options. You can build a manual replay UI, an automated retry on a schedule, or just an alert that fires to Slack so you know something failed.
The Postgres approach pairs well with production state management because your failed events live in the same database as your call state. One place to look when things go wrong.
A minimal dead letter table needs at least these columns:
event_id(the original call or webhook ID)payload(the full JSON, stored as JSONB)error_message(what failed and why)failed_at(timestamp)retry_count(how many attempts were made)status(pending, replayed, resolved)

Where does cost discipline fit into this architecture?
Replaying an event costs a fraction of what it costs to re-run a full voice AI call. The infrastructure is cheap. The lost lead isn't.
A replay reads from your dead letter table, reconstructs the payload, and re-runs the downstream steps. No new Retell AI call. No new TTS. No telephony minutes. You're just re-processing data you already have. The compute cost is negligible.
Contrast that with what happens if you don't catch the failure. The lead doesn't get logged in GHL. No follow-up gets triggered. The contact may never hear from you again unless they reach back out themselves. For a finance broker or insurance business, a missed follow-up isn't a minor inconvenience. It's a lost opportunity that cost real money to generate.
The full cost picture of running voice agents is already layered. As covered in the voice agent cost breakdown, you're paying across multiple services simultaneously. Losing a lead on top of that spend stings twice.

Is there a compliance angle to this for Australian businesses?
Yes. Under the Australian Privacy Act, you're responsible for what happens to personal data that passes through your systems, including data in failed events.
A call event payload typically contains the contact's name, phone number, and whatever the agent captured during the conversation. If that payload sits unprotected in a generic error log, or worse, gets dropped entirely with no record, you've got an auditing problem. The OAIC's APP guidelines are clear that entities must take reasonable steps to protect personal information from misuse or loss.
A dead letter queue with proper access controls and retention policies isn't just good engineering. It's defensible practice. You know where every piece of data went. You can show what failed, when, and what you did about it.

Key Takeaways
- A dead letter queue is the difference between a recoverable failure and a permanently lost lead.
- In N8N, wire an Error Trigger to a Postgres dead letter table. Store the full payload, the error, and the retry count.
- Replay costs a fraction of re-running the original call. The infrastructure is worth it.
- For Australian service businesses, a dead letter queue also supports Privacy Act compliance by giving you a full audit trail of what happened to every event.

If your voice AI stack doesn't have a dead letter queue, it has an unknown number of leads it's already dropped. You just don't know which ones. If you want to know what else your current build might be quietly losing, DM me AUDIT and I'll send you five questions that'll show you where the gaps are.
Frequently Asked Questions
Written by Syed Bilgrami
Founder of TheAutomate.io, building AI voice agents for Australian businesses



