patternjavascriptMajor
SQS dead letter queues for poison message isolation and debugging
Viewed 0 times
dead letter queueDLQpoison messagemaxReceiveCountvisibility timeoutmessage retrySQS error handling
Problem
A malformed message or bug in the consumer causes repeated processing failures. Without a DLQ, the message keeps cycling back to the queue after the visibility timeout, blocking other messages and consuming capacity.
Solution
Configure a Dead Letter Queue with a maxReceiveCount (e.g., 3-5 attempts). Failed messages are automatically moved to the DLQ after exceeding maxReceiveCount. Set a CloudWatch alarm on ApproximateNumberOfMessagesVisible on the DLQ to be alerted on failures.
// CDK setup
const dlq = new sqs.Queue(this, 'DLQ', {
retentionPeriod: Duration.days(14),
});
const queue = new sqs.Queue(this, 'MainQueue', {
deadLetterQueue: {
queue: dlq,
maxReceiveCount: 3,
},
visibilityTimeout: Duration.seconds(30),
});Why
The DLQ isolates poison messages from the main queue, allowing healthy messages to continue processing. The DLQ provides a safe place to inspect, replay, or manually process failed messages after fixing the underlying bug.
Gotchas
- DLQ must be in the same region and account as the source queue
- DLQ for FIFO queue must also be a FIFO queue
- Messages in the DLQ still count toward storage costs — set a reasonable retention period
- Use SQS dead-letter queue redrive (built-in console feature) to replay messages back to the source queue after fixing the bug
- Visibility timeout on the source queue must be >= Lambda function timeout to prevent premature requeue
Code Snippets
Lambda partial batch failure response to avoid sending successful messages to DLQ
// Lambda SQS handler — return failed records for partial batch failure
export const handler = async (event) => {
const failures = [];
for (const record of event.Records) {
try {
await processRecord(record);
} catch (err) {
console.error('Failed to process record', record.messageId, err);
failures.push({ itemIdentifier: record.messageId });
}
}
return { batchItemFailures: failures }; // only failed items are requeued
};Context
Building reliable message-processing systems with SQS and Lambda or EC2 consumers
Revisions (0)
No revisions yet.