POS SystemsPayment InfrastructureArchitectureDistributed Systems

If Your Payment System Works, It Doesn't Mean It Scales

By Farnaz Bagheri··19 min read

A payment integration passes every test. Authorization works. Voids go through. Refunds post correctly. The QA team signs off. The integration goes live at 200 locations. For three weeks, everything is fine.

Then Black Friday happens. Transaction volume triples. The processor's response times spike from 400ms to 2.8 seconds. Timeouts start firing. Retry logic kicks in. By noon, the support queue has 40 merchants reporting double charges. A subset of transactions settled but have no matching record in the POS. Two merchants show negative balances that shouldn't exist. The on-call engineer opens the logs and finds 11,000 duplicate authorization requests sent in a 90-minute window.

Nothing in the system was broken. Every component did exactly what it was designed to do. The problem was that the system was designed to work — not to work under pressure.

The "if it works, it scales" fallacy

Payment integrations are deceptive. At low volume, they behave like synchronous request-response systems. You send an authorization. You get a response. You record the result. The happy path is clean, and the sad path is mostly "declined." There aren't many states to manage, and timing doesn't matter because nothing overlaps.

Scale changes every one of those assumptions.

At volume, requests overlap. Responses arrive out of order. Network partitions happen mid-transaction. Processors throttle. Gateways queue. Timeouts trigger retries that arrive at the processor after the original request already succeeded. A batch close races against a late void. A webhook fires before the local database write commits.

These aren't theoretical scenarios. They're the normal operating conditions of a payment system handling more than a few hundred transactions per hour across distributed locations. The question isn't whether they'll happen. It's whether the system was designed to handle them when they do.

Failure mode 1: The duplicate charge

The most common failure at scale is the double charge. It's also the most damaging — because it's immediately visible to the customer, it erodes merchant trust, and it's operationally expensive to fix.

Here's how it happens. The POS sends an authorization request to the processor. The processor receives it, approves the transaction, and sends back a response. But the response is slow — the network is congested, or the processor's gateway is under load. The POS hits its timeout threshold and assumes the request failed. It retries.

The processor now receives a second authorization request for the same amount, same card, same merchant. Most processors treat this as a new transaction. They approve it. The customer is charged twice.

The POS receives the response to the retry, records it as the transaction of record, and moves on. The original approval — which the POS never received — is now an orphan. It will settle, it will be deposited into the merchant's account, and nobody in the POS system knows it exists until the customer calls their bank or the reconciliation numbers don't add up.

rendering…

This is not a bug in the retry logic. It's the absence of idempotency — the system's inability to recognize that two requests represent the same intent.

Idempotency is not optional

Idempotency means that sending the same request twice produces the same result as sending it once. In payment systems, this is the single most important property for operating at scale, and it's the one most often missing from initial implementations.

The mechanism is straightforward. Before sending an authorization request, the POS generates a unique idempotency key — typically a UUID or a deterministic hash of the transaction's identifying attributes (terminal ID, sequence number, timestamp, amount). This key is sent with the request. If the processor receives a second request with the same key, it returns the result of the first request instead of processing a new transaction.

But the implementation details matter enormously.

The key must be generated before the first attempt, not per-attempt. If the POS generates a new key for each retry, the processor sees each retry as a distinct transaction. This sounds obvious, but it's a mistake that shows up in production systems with alarming frequency — usually because the idempotency key generation was added to the request-building function rather than the transaction-initiation function.

The key must survive process restarts. If the POS application crashes after sending the request but before receiving the response, and the key was only held in memory, the restarted process has no way to check whether the original request succeeded. It will create a new key and send a new request.

The processor must actually support and enforce idempotency. Not all do, or not for all operations. Some processors support idempotency for authorizations but not for voids. Some have a short deduplication window — 60 seconds, five minutes — after which the same key is treated as a new request. The POS must know these constraints and design around them.

rendering…

Failure mode 2: The lost transaction

The inverse of the double charge is the transaction that the processor approved but the POS never recorded. This happens when the POS sends a request, the processor approves it, but the response is lost — or received after the POS has already moved on.

At low volume, this almost never happens. Timeouts are rare. Networks are reliable. The window between "sent" and "received" is small enough that interruptions are unlikely.

At scale, the window widens. Processor response times increase. The probability of a timeout, a network blip, or a process crash during that window goes up linearly with volume. If the POS processes 50 transactions per hour, a 0.1% failure rate means one lost transaction every eight days. At 5,000 transactions per hour, it's one every 12 minutes.

The lost transaction is insidious because the POS has no record of it. The customer was charged. The merchant will receive the deposit. But the POS shows no completed transaction for that order. The server doesn't know the table paid. The cashier asks the customer to pay again.

The defense against lost transactions is the same as the defense against duplicates: idempotency keys, persisted before the request is sent, that allow the POS to query the processor for the outcome of a specific request even if the response was never received.

But it also requires a recovery process — a background reconciliation loop that periodically checks for transactions in a "sent but not confirmed" state and queries the processor for their status. This loop must run continuously, not just at end-of-day. A merchant who discovers a missing transaction at batch close has already been operating with incorrect data for hours.

The Uncertainty Window

Both the double charge and the lost transaction share a single root cause. I call it the Uncertainty Window — the period between when a payment request leaves the POS and when the POS has a confirmed, durable record of the outcome. During this window, money may have moved, but the system that initiated the movement doesn't know.

In most distributed systems, this kind of ambiguity is manageable. A failed write to a cache can be retried. A lost message can be replayed from a queue. The operations are either naturally idempotent or the consequences of a duplicate are trivial — an extra log entry, a redundant notification, a cache that gets refreshed one time too many.

In payments, the Uncertainty Window is where every serious failure originates. A duplicate request during the window means a double charge. A crash during the window means a lost transaction. A timeout during the window means the system doesn't know which of those two things happened — and the only safe assumption is that the money moved.

Every defensive mechanism in a scalable payment system — idempotency keys, persisted request state, recovery loops, processor status queries — exists to shrink the Uncertainty Window or to make the system's behavior safe while inside it. The window can never be eliminated entirely. Network latency is physics. But the goal is to ensure that no transaction can be inside the window without the system knowing it's there, tracking how long it's been there, and having a plan for resolving it.

Failure mode 3: The race condition

POS systems are inherently concurrent. Multiple terminals operate simultaneously. Multiple staff members can interact with the same check. A server can close a tab at the same terminal where a manager is applying a void.

At low volume, these operations rarely overlap. At scale, they overlap constantly. And when they overlap on the same transaction, the results are unpredictable unless the system enforces strict ordering.

The classic race condition: a customer's card is authorized for $92.00. The server adds a tip, bringing the total to $108.50. Simultaneously, a manager voids the transaction because the customer complained. The tip adjustment and the void are submitted to the processor within milliseconds of each other. Depending on which arrives first, the processor might void the original $92.00 auth and then receive a tip adjustment for a transaction that no longer exists. Or it might process the adjustment to $108.50 and then void the adjusted amount. Or it might reject the second operation entirely because it conflicts with the first.

The POS, meanwhile, might record the void as successful and the adjustment as failed — or the reverse — depending on the order in which the responses arrive, not the order in which the requests were sent.

rendering…

The fix is not faster processing. The fix is serialized access to transaction state. Operations on the same transaction must be queued and executed sequentially, with each operation validating the current state before proceeding. A void should check whether a tip adjustment is in flight. A tip adjustment should check whether a void has been submitted. This requires a transaction-level lock or, at minimum, an optimistic concurrency check with version numbers.

This is not complex in concept. But it requires the system to treat each transaction as a stateful entity with controlled transitions, not as a row in a table that any process can update at any time.

Retry strategies that don't make things worse

Every payment system has retry logic. Few have retry logic that's actually safe.

The naive approach — if the request times out, try again immediately — is catastrophic at scale. If the processor is slow because it's under load, retrying immediately adds to that load. If 500 terminals all hit a timeout and all retry simultaneously, the processor receives 1,000 requests instead of 500. Response times get worse. More timeouts fire. More retries trigger. The system enters a death spiral where the retry mechanism amplifies the very problem it was designed to recover from.

Safe retry strategies for payment systems have specific characteristics.

Exponential backoff with jitter. Each retry waits longer than the previous one, with a random offset to prevent thundering herd. The first retry after 1 second, the second after 2–4 seconds, the third after 4–8 seconds. The jitter prevents all terminals from retrying at exactly the same moment.

A maximum retry count that is lower than intuition suggests. For payment authorizations, two retries is usually the right limit. Three at most. Beyond that, the probability that the processor is experiencing a systemic issue is high enough that continued retries will cause more harm than good.

Circuit breaking at the terminal or location level. If a terminal has seen three consecutive timeouts, it should stop sending requests for a cooldown period rather than continuing to retry. If an entire location is experiencing failures, the system should degrade — queue transactions for later submission, fall back to offline mode, or alert the operator — rather than hammering a processor that isn't responding.

Retry budgets. Rather than controlling retries per-request, the system tracks the total retry rate across all requests. If retries exceed a percentage of total traffic — say, 10% — additional retries are suppressed system-wide. This prevents localized failures from generating enough retry traffic to cause cascading failures.

rendering…

The consistency problem you can't avoid

At scale, a POS system is a distributed system whether it was designed as one or not. Multiple terminals write to a shared database. Transactions are in-flight across multiple processor connections simultaneously. State changes arrive from external systems — webhooks, settlement files, chargeback notifications — at unpredictable times.

The fundamental tension is between consistency and availability. A payment system that locks the entire transaction table during every operation will be perfectly consistent but unusably slow at volume. A system that allows concurrent writes without coordination will be fast but will produce inconsistent states.

Most POS systems deal with this by not dealing with it. They use a shared relational database, rely on row-level locks where they remember to add them, and hope that concurrent operations on the same transaction are rare enough that conflicts don't cause visible problems. At low volume, they're right. At scale, they're not.

This is where payment systems diverge sharply from general distributed systems. In a typical web application, an inconsistent read is a stale product listing or a like count that's off by one. The user refreshes and sees the correct state. The system converges, and nobody notices the gap. In a payment system, an inconsistent state is a merchant who thinks a $400 transaction was voided when the processor actually settled it — or a customer whose card was charged for an order the POS shows as unpaid. There is no "refresh." The inconsistency has already moved money, and unwinding it requires manual intervention across multiple systems. The tolerance for inconsistency in payments isn't low. It's zero for any state that involves a financial commitment.

The architectural decision that matters most is defining the consistency boundary. Not every operation needs global consistency. A new authorization at Terminal 3 doesn't need to coordinate with a void at Terminal 7 — they're operating on different transactions. But a tip adjustment and a void on the same transaction absolutely need to coordinate.

The pattern that works is transaction-level consistency with system-level eventual consistency. Each transaction is a consistency boundary. Operations within that boundary are serialized and strongly consistent. But the system as a whole is eventually consistent — the POS's view of a transaction's state may lag behind the processor's view by seconds or minutes, and that's acceptable as long as the system has a mechanism to detect and resolve the divergence.

This is where the reconciliation loop from earlier becomes load-bearing. It's not just recovering lost transactions. It's the mechanism by which the system converges — detecting cases where the POS's state and the processor's state have diverged, and either resolving them automatically or flagging them for human review.

Failure isolation: not all failures are equal

A processor timeout is not a network failure is not a database error is not a business logic bug. At scale, lumping all failures into the same error path is a recipe for cascading outages.

Consider a POS system connected to three payment processors. Processor A starts returning 503 errors. If the system treats this the same as a declined transaction — retrying and eventually failing — every terminal that routes through Processor A will experience degraded performance. If the retry logic is aggressive enough, the increased latency on Processor A transactions will consume connection pool resources, slowing down requests to Processors B and C as well.

Failure isolation means building walls between failure domains. Processor connections should have independent connection pools, independent timeout configurations, and independent circuit breakers. A failure in one processor's integration should not be able to degrade the system's interaction with any other processor.

This extends to the database layer. If the transaction write to the local database fails, the system needs to handle that differently than a processor timeout. A database failure means the POS can't reliably track state and should stop accepting new transactions. A processor timeout means the POS should queue the operation and continue serving other transactions through other processors or tender types.

rendering…

The state machine you didn't know you needed

The root cause of most scaling failures in payment systems is that the transaction doesn't have a well-defined state machine. Instead, it has a status column that gets overwritten by whatever operation touched it last.

At low volume, this works because operations don't overlap and state transitions are sequential. At scale, it breaks because multiple processes attempt to transition the same transaction simultaneously, and the last writer wins — regardless of whether their transition was valid given the current state.

A well-defined transaction state machine enforces three properties. First, each state has a defined set of valid transitions. An authorized transaction can be adjusted, voided, or settled — but not refunded, because refunds only apply to settled transactions. Second, each transition is atomic — it checks the current state, validates the transition, and updates the state in a single operation. Third, transitions are recorded, not just the final state. The transaction maintains a history of every state change, including who initiated it and when.

This state machine is the backbone of a scalable payment system. It prevents race conditions by rejecting invalid transitions. It enables recovery by allowing any process to determine the exact state of any transaction at any point in time. And it makes reconciliation tractable by providing a complete audit trail that can be compared against the processor's records.

What separates scalable payment systems from fragile ones

The difference is not technology. It's not faster hardware, better databases, or more sophisticated frameworks. The difference is whether the system was designed around the assumption that things will go wrong — that requests will be duplicated, responses will be lost, operations will overlap, processors will slow down, and every component will eventually behave in a way that wasn't anticipated.

Fragile payment systems are designed around the happy path. They assume requests succeed on the first try. They assume responses arrive promptly. They assume operations on the same transaction don't overlap. They assume the processor is always available. When any of these assumptions break — and at scale, all of them break — the system produces double charges, lost transactions, inconsistent states, and cascading failures.

Scalable payment systems are designed around uncertainty. Every request carries an idempotency key. Every state transition is validated and recorded. Every retry is bounded and backoff-aware. Every processor connection is isolated. Every divergence between local state and processor state is detected and resolved through continuous reconciliation. The happy path still works — it just isn't the only path the system knows how to walk.

This isn't about over-engineering. It's about recognizing that a payment system operating at scale is a distributed system with real money flowing through it, and distributed systems fail in ways that can't be tested on a single terminal in a QA lab. The failure modes are specific and predictable. The defenses are well understood. The only question is whether they're built into the architecture from the start, or bolted on after the first Black Friday teaches the team what "at scale" actually means.

The payment systems that don't break at scale are the ones that were built expecting to.

Here's the insight that takes most teams years to reach: the true product of a scalable payment system is not the transaction — it's the audit trail. Processing a payment is table stakes. Any integration can move money from a card to a merchant account. What separates production-grade systems from prototypes is the ability to answer, for any transaction, at any point in time, exactly what happened, in what order, and why. When a merchant calls about a missing deposit, when a customer disputes a charge, when settlement doesn't match authorization — the system that can reconstruct the full story is the one that survives. The one that can't will spend every scaling milestone relearning the same lesson: moving money is easy, but knowing where it went is the actual engineering problem.


This is part of an ongoing series on payment infrastructure and POS system design. Previous articles in this series cover reconciliation, processor fragmentation, and offline payment handling.