The Dead Letter Queue: Where Failed Calls Go to Be Replayed
    Voice AI

    The Dead Letter Queue: Where Failed Calls Go to Be Replayed

    SBSyed Bilgrami23 June 20265 min read

    When a call event fails, most stacks quietly drop it. A dead letter queue catches the failure and lets you replay it. Here's the build.

    When a call event fails, most stacks quietly drop it. A dead letter queue catches the failure and lets you replay it. Here's the build.

    TL;DR

    • A dead letter queue catches failed call events so your voice AI stack can replay them instead of silently dropping leads.
    • Without one, a transient API timeout means the lead is gone. No alert. No retry. Just gone.
    • If you're running a production voice agent for any service business, this is non-negotiable infrastructure.

    A dead letter queue is where a failed call event lands when your automation can't process it. Instead of the event disappearing, it parks itself somewhere safe so you can inspect it and replay it.

    What actually happens when a call event fails?

    Most stacks don't fail loudly. They just drop the event and move on.

    Your webhook fires. Retell AI sends the call completed payload. N8N receives it, tries to write to GHL, hits a timeout. What happens next depends entirely on whether you've got error handling wired up. If you haven't, the answer is nothing. The event's gone. The lead's gone. Nobody knows.

    This isn't a hypothetical. Transient failures happen constantly in production. API rate limits, downstream CRM hiccups, temporary network drops. Any of them can swallow an event.

    Voice AI call event failure flow showing where events drop without a dead letter queue

    What does a dead letter queue actually do?

    It catches the failed event before it disappears and routes it somewhere you control.

    The concept comes from message queue systems but the principle applies to any async automation. When an event can't be processed successfully after a set number of retries, instead of discarding it, you push it to a holding queue. That queue holds the payload intact. You can inspect what failed, fix the underlying issue, and replay the event against the live system.

    For a voice AI pipeline, that payload typically includes the call ID, the contact record, the outcome data, and whatever the agent captured during the conversation. All of that is recoverable. But only if you catch it.

    Dead letter queue component diagram showing event routing on failure

    How do you wire a dead letter queue into an N8N voice AI pipeline?

    You add an error branch to every webhook handler, then write failures to a Postgres table or a dedicated queue node.

    In N8N, the simplest implementation uses the Error Trigger node combined with a Postgres write. Every failed execution gets caught by the error trigger. You write the raw payload, the error message, the timestamp, and the workflow ID to a dead letter table. From there you've got options. You can build a manual replay UI, an automated retry on a schedule, or just an alert that fires to Slack so you know something failed.

    The Postgres approach pairs well with production state management because your failed events live in the same database as your call state. One place to look when things go wrong.

    A minimal dead letter table needs at least these columns:

    • event_id (the original call or webhook ID)
    • payload (the full JSON, stored as JSONB)
    • error_message (what failed and why)
    • failed_at (timestamp)
    • retry_count (how many attempts were made)
    • status (pending, replayed, resolved)

    Dead letter queue architecture diagram for N8N voice AI pipeline with Postgres

    Where does cost discipline fit into this architecture?

    Replaying an event costs a fraction of what it costs to re-run a full voice AI call. The infrastructure is cheap. The lost lead isn't.

    A replay reads from your dead letter table, reconstructs the payload, and re-runs the downstream steps. No new Retell AI call. No new TTS. No telephony minutes. You're just re-processing data you already have. The compute cost is negligible.

    Contrast that with what happens if you don't catch the failure. The lead doesn't get logged in GHL. No follow-up gets triggered. The contact may never hear from you again unless they reach back out themselves. For a finance broker or insurance business, a missed follow-up isn't a minor inconvenience. It's a lost opportunity that cost real money to generate.

    The full cost picture of running voice agents is already layered. As covered in the voice agent cost breakdown, you're paying across multiple services simultaneously. Losing a lead on top of that spend stings twice.

    Cost comparison showing replay cost versus lost lead cost in voice AI stack

    Is there a compliance angle to this for Australian businesses?

    Yes. Under the Australian Privacy Act, you're responsible for what happens to personal data that passes through your systems, including data in failed events.

    A call event payload typically contains the contact's name, phone number, and whatever the agent captured during the conversation. If that payload sits unprotected in a generic error log, or worse, gets dropped entirely with no record, you've got an auditing problem. The OAIC's APP guidelines are clear that entities must take reasonable steps to protect personal information from misuse or loss.

    A dead letter queue with proper access controls and retention policies isn't just good engineering. It's defensible practice. You know where every piece of data went. You can show what failed, when, and what you did about it.

    Architecture diagram showing dead letter queue with compliance controls for AU voice AI

    Key Takeaways

    • A dead letter queue is the difference between a recoverable failure and a permanently lost lead.
    • In N8N, wire an Error Trigger to a Postgres dead letter table. Store the full payload, the error, and the retry count.
    • Replay costs a fraction of re-running the original call. The infrastructure is worth it.
    • For Australian service businesses, a dead letter queue also supports Privacy Act compliance by giving you a full audit trail of what happened to every event.

    Outcome summary showing dead letter queue protecting lead recovery in production voice AI

    If your voice AI stack doesn't have a dead letter queue, it has an unknown number of leads it's already dropped. You just don't know which ones. If you want to know what else your current build might be quietly losing, DM me AUDIT and I'll send you five questions that'll show you where the gaps are.

    Frequently Asked Questions

    Share this article


    SB

    Written by Syed Bilgrami

    Founder of TheAutomate.io, building AI voice agents for Australian businesses

    Want to see how AI voice agents can work for your business?

    Book a free 30-minute discovery call with Syed. No obligation, no sales pitch.

    Related Articles