Idempotency in Distributed Systems That Actually Works

Stop duplicate side effects

Page content

Idempotency in distributed systems is the property that saves you after the network lies, the queue retries, the client panics, and the operator hits replay. In production systems, duplicate delivery is normal. Duplicate side effects are the bug.

HTTP defines an idempotent method as one where multiple identical requests have the same intended effect on the server as one request. That is why PUT, DELETE, and safe methods are idempotent in protocol semantics and can be retried automatically after a communication failure.

integration message flow: idempotency

That definition is useful, but it is not enough. In real architectures, idempotency is not an HTTP trivia answer. It is a business guarantee. If a customer hits “pay” once, you do not get to charge twice because a timeout happened between commit and response. If a worker updates inventory and crashes before acking the message, you do not get to decrement stock twice because the broker redelivered. That is the bar.

The mistake I see over and over is treating idempotency as a transport feature instead of a system property. Queue deduplication, HTTP verbs, and client retries help, but none of them rescue a design that lets the same business intent create a second side effect. If you want the broader framing for how these integration decisions fit service boundaries and persistence trade-offs, start with App Architecture in Production: Integration Patterns, Code Design, and Data Access.

Where duplicates come from in production

Duplicates do not appear because teams are careless. They appear because distributed systems retry, reorder, and replay.

A client can send a create request, the server can commit it, and the response can still disappear on the wire. That is exactly why HTTP distinguishes idempotent methods and why payment APIs such as Stripe and PayPal expose explicit idempotency mechanisms for unsafe methods like POST.

Message brokers make the problem even more obvious. At-least-once delivery means a consumer can be invoked repeatedly for the same message, and a handler can update the database successfully but fail before acknowledgment, causing the broker to deliver the same message again.

Webhooks are no different. GitHub says webhook deliveries can arrive out of order, failed deliveries are not automatically redelivered, and each delivery carries a unique X-GitHub-Delivery GUID that you should use when protecting against replay. For a practical architecture view of chat endpoints as interaction boundaries, see Chat Platforms as System Interfaces in Modern Systems.

Even systems that advertise stronger guarantees still leave you work to do. Kafka can prevent duplicate entries in Kafka logs with idempotent producers and can provide exactly-once delivery for read-process-write flows that stay inside Kafka with transactions and read_committed consumers. But Kafka’s own design docs are clear that external systems still require coordination with offsets and outputs. Google Cloud Pub/Sub exactly-once delivery is limited to pull subscriptions, within a cloud region, and still requires clients to track processing progress until acknowledgment succeeds.

My opinionated summary is simple. Assume the transport will retry. Assume operators will replay. Assume webhooks will arrive late. Design the write path so a repeated intent cannot create a second business effect.

The API contract I actually trust

How do idempotency keys prevent duplicate API requests

The only API contract I trust for mutating operations is caller-supplied intent plus server-side persistence.

AWS recommends a caller-provided request identifier and warns that the service must atomically record the idempotency token together with the mutating work. Stripe stores the first status code and response body for a key, compares later parameters with the original request, and returns the same result for retries. PayPal uses PayPal-Request-Id on supported POST APIs and returns the latest status for the previous request with that same header.

That leads to a practical contract:

  1. The client generates an idempotency key for a business operation.
  2. The server scopes that key by tenant and operation name.
  3. The server stores a request hash so the same key cannot be reused for a different payload.
  4. The server records state such as pending, completed, or failed.
  5. Retries with the same key either return the stored outcome or a stable pointer to it.
  6. Retries with the same key and a different payload fail loudly.

There is an IETF Idempotency-Key header draft, but as of 2026-05-09 it is still listed in the IETF Datatracker as an expired Internet-Draft rather than a published RFC. In practice, the header name is still widely useful as a de facto convention, but you should document the contract in your own API instead of pretending the standard is finished.

What should the key represent? Intent. Not an HTTP attempt. Not a TCP connection. Not a retry counter. If the user means “create order 123 once”, every retry for that same command must reuse the same key. If the user means “place a second order”, that must use a different key.

A request ID is for tracing. An idempotency key is for correctness. If you mix those up, your dashboards look tidy while your money moves twice.

Why PUT is not enough

No, HTTP PUT is not enough to make an operation idempotent.

Yes, RFC 9110 gives PUT idempotent semantics. But if your PUT handler emits a new downstream event, sends an email on every retry, or charges an external provider again, then your implementation has violated the business contract even if your route name looks respectable.

Verb choice helps clients understand intent. It does not implement intent for you.

Use PUT when the resource model genuinely fits a full replacement or upsert style operation. Use POST when you are creating commands or actions. But for any mutation that might be retried across network boundaries, document an explicit idempotency contract. If your mutating actions are triggered from chat workflows, the same contract applies in Slack Integration Patterns for Alerts and Workflows and Discord Integration Pattern for Alerts and Control Loops. Hidden side effects are where architecture goes to die.

How long should an idempotency key be stored

Longer than your transport team wants.

Stripe says keys can be pruned after at least 24 hours. PayPal says retention is API specific and gives examples that can last up to 45 days. Amazon SQS FIFO deduplicates only within a 5-minute window. GitHub keeps recent deliveries for 3 days for manual redelivery. Those numbers are wildly different because the right retention period is a business decision, not a protocol default.

If you only keep keys for five minutes because your queue does, you are not designing idempotency. You are copying a transport limitation into your business layer.

Keep idempotency records for at least the maximum of these windows:

  • client retry horizon
  • queue redrive horizon
  • webhook replay horizon
  • operator replay horizon
  • settlement or compensation horizon for money-moving operations

For payments, bookings, and provisioning, that often means hours or days, not minutes.

AWS also calls out two anti-patterns I fully agree with. Do not use timestamps as the key, because clock skew and collisions make them unreliable. Do not blindly store entire request payloads as the dedup record for every request, because that harms performance and scalability. Store a normalized request hash plus the minimum response state you need to replay safely. If you must reproduce the first response byte for byte, store the canonical response body the way Stripe does.

The database patterns that make idempotency real

Idempotency becomes real when the persistence layer can win a race exactly once.

PostgreSQL gives you two critical primitives here. Unique constraints enforce uniqueness on one or more columns, and INSERT ... ON CONFLICT lets you define an alternative action instead of failing on a uniqueness violation. PostgreSQL also documents that ON CONFLICT DO UPDATE guarantees an atomic insert-or-update outcome under concurrency.

That means your idempotency layer should usually start with a table like this:

create table api_idempotency (
    tenant_id text not null,
    operation text not null,
    idempotency_key text not null,
    request_hash text not null,
    state text not null,
    status_code integer,
    response_body jsonb,
    resource_type text,
    resource_id text,
    created_at timestamptz not null default now(),
    expires_at timestamptz not null,
    primary key (tenant_id, operation, idempotency_key)
);

And the handling flow should look like this:

begin transaction

try insert (tenant_id, operation, idempotency_key, request_hash, state='pending')
on conflict do nothing

load row for (tenant_id, operation, idempotency_key) for update

if row.request_hash != incoming_request_hash
    fail with conflict or validation error

if row.state = 'completed'
    return stored response

if row.state = 'pending' and row was created by another live request
    either wait briefly, or fail fast with a retryable response

perform local business mutation

store stable result in idempotency row
set state = 'completed'

commit
return result

The important part is not the syntax. The important part is the atomicity. Recording the key and performing the mutation must succeed or fail together. AWS says this explicitly for API idempotency, and the same rule applies in SQL-backed services.

Do not do a naive check-then-act sequence like “select key; if missing then insert order”. Under concurrency, two requests can pass the check and both create the side effect. A unique constraint is not optional. It is the mechanism that turns your architecture from optimistic folklore into something you can prove under load.

Here is the rule I use in reviews. If the dedup decision is not protected by the same transactional boundary as the mutation, you do not have idempotency. You have hope.

Messages, events, and webhooks need their own boundary

How do consumers handle duplicate events and messages

For message consumers, the classic pattern is still the right one. Record processed message IDs in the same database transaction as the business update. Chris Richardson describes the PROCESSED_MESSAGES table approach directly, using a primary key on subscriber and message ID so duplicates fail cleanly and can be ignored.

Many teams call that explicit processed_messages store an inbox table. The label matters less than the rule. The receiver must persist proof that it already handled the message before a retry can safely do nothing.

A minimal form looks like this:

create table processed_messages (
    subscriber_id text not null,
    message_id text not null,
    processed_at timestamptz not null default now(),
    primary key (subscriber_id, message_id)
);

And the consumer flow is just as strict as the HTTP flow:

begin transaction

insert into processed_messages (subscriber_id, message_id)
values (?, ?)
on conflict do nothing

if no row inserted
    rollback
    ack and ignore duplicate

apply business mutation

commit
ack message

That pattern is boring. Good. Idempotency should be boring.

It is also usually better than trying to lean on broker marketing terms. Kafka’s exactly-once support is excellent when you stay inside Kafka’s own transactional model, but Kafka’s docs still warn that external destinations need cooperation. SQS FIFO reduces duplicate sends only within its 5-minute dedup window. Pub/Sub exactly-once still expects the subscriber to track progress and avoid duplicate work when acknowledgments fail.

Exactly-once is usually a local optimization. Idempotent side effects are the system guarantee.

Pair dedup with the outbox pattern

If your service updates local state and also publishes an event, idempotent consumption alone is not enough. You also need a safe way to get the event out after the local transaction commits.

That is why the transactional outbox pattern matters. Chris Richardson describes the basic idea as writing the event to an outbox table in the same transaction as the business update, and then publishing it asynchronously. Debezium says the outbox pattern avoids inconsistencies between a service’s internal state and the events consumed by other services. NServiceBus goes further and shows how outbox processing deduplicates incoming messages and avoids zombie records and ghost messages.

This is the architecture I recommend for services that own data and publish integration events:

  1. Validate and persist the command under an idempotency key.
  2. Write business state and outbox event in one local transaction.
  3. Let CDC or an outbox dispatcher publish the event.
  4. Make downstream consumers idempotent too.

Outbox does not remove the need for idempotent consumers. It removes the need to pretend that a database commit and a broker publish can be one magical distributed transaction when they usually cannot.

Webhooks are just messages with better branding

Treat inbound webhooks exactly like messages from an untrusted network edge.

GitHub documents that deliveries can arrive out of order, recommends using X-Hub-Signature-256 to verify authenticity, and provides X-GitHub-Delivery as the unique delivery identifier. It also notes that redeliveries reuse the same delivery ID.

So the architecture is straightforward:

  • verify the signature first
  • use the delivery GUID as the dedup key
  • persist receipt before side effects
  • make handlers order-aware rather than assuming arrival order
  • enqueue the heavy work and return fast

If your webhook handler writes directly to business tables before it records receipt, it is not production-ready. It is just faster at making duplicate mistakes.

Sagas and workflow engines still need idempotency

Sagas and durable workflow engines do not delete the problem. They make it visible.

Temporal recommends writing Activities to be idempotent because Activities can be retried after failures or timeouts. Its docs even call out the edge case where a worker completes an external side effect successfully but crashes before reporting completion, which causes the Activity to run again. Temporal also suggests using a combination of Workflow Run ID and Activity ID as a stable idempotency key when calling downstream services. If you are applying this in service orchestration, Go Microservices for AI/ML Orchestration covers the broader workflow trade-offs.

That is exactly the right mental model. A workflow engine can preserve execution history and coordinate retries. It cannot retroactively uncharge a card or unsend an email unless your application gives it idempotent steps and idempotent compensations.

The same applies to sagas. Temporal’s own saga guidance describes compensating actions that run when a step fails. Those compensations must be idempotent too. If “refund payment” runs twice, you may have solved the original bug by creating a new one.

My rule here is brutal and simple. Every Activity, every command handler, and every compensation that touches the outside world should either be naturally idempotent or carry a real idempotency key to the downstream system.

How to test idempotency before production

Most teams test happy paths and then act surprised when retries happen. That is not enough.

You should have automated tests for at least these cases:

  • the server commits the mutation but the response never reaches the client
  • two identical requests race with the same idempotency key
  • the same key is reused with a different payload
  • a consumer commits its database work and crashes before ack
  • a webhook is replayed with the same delivery ID
  • an outbox dispatcher publishes the same event more than once
  • a workflow Activity completes the external call and crashes before completion is reported
  • an idempotency record expires and a genuine late retry arrives

AWS explicitly recommends comprehensive test suites that include successful requests, failed requests, and duplicate requests. That advice is pedestrian and absolutely correct.

I would add one more failure drill. Verify that the replayed response is semantically equivalent to the first result. AWS discusses late-arriving retries and argues for responses that preserve the original meaning even after underlying state has changed. That is the difference between “no extra side effect happened” and “the caller still has a consistent contract.”

Opinionated rules that save real systems

Here are the rules I would enforce in an architecture review.

First, idempotency keys belong to business intent, not transport attempts.

Second, scope every key by tenant and operation. Global key spaces are how unrelated requests collide.

Third, persist the dedup decision atomically with the mutation. If that is not true, the design is wrong.

Fourth, reject same-key different-payload retries. Stripe and AWS both do this for good reason.

Fifth, keep keys for the full replay horizon of the business process, not for the shortest queue window.

Sixth, pair producers with an outbox and consumers with message ID tracking. One side without the other is half a design.

Seventh, propagate the same operation identity downstream when the business action is the same. AWS explicitly recommends passing the idempotency token along the processing chain.

Eighth, never assume exactly-once marketing removes the need for idempotent side effects.

If that sounds strict, good. Idempotency is where optimistic architecture meets production reality. You do not need complexity everywhere. But wherever duplicate side effects would hurt money, state, or trust, idempotency should be a first-class part of the contract.

Subscribe

Get new posts on AI systems, Infrastructure, and AI engineering.