Designing a Unified Payment Orchestration Architecture for POS SaaS

The previous articles in this series have traced two structural problems in POS payment infrastructure: the cost of processor fragmentation, and the difficulty of building reliable offline capability. Both problems have the same architectural solution — a payment orchestration layer that separates the concerns of routing and policy from the concerns of processor-specific integration.

This article describes what that layer looks like, how to design it, and where the difficult design decisions are.

What orchestration is and isn't

Payment orchestration is a term that gets used loosely. In the market, it often describes services that provide access to multiple processors through a single API. That's a useful product, but it's not the same as building orchestration into your own platform.

For a POS SaaS vendor, an orchestration layer is a component in your own system that sits between the business logic of your application and the individual processor integrations. It owns:

The decision of which processor to use for a given transaction
The retry and failover behavior when a processor is unavailable or declines
The offline queuing and sync behavior described in the previous article
The normalization of processor responses into a unified format
The state management required for reconciliation

It does not own:

Processor-specific protocol implementation (that belongs in adapters)
Business logic about what constitutes a valid transaction (that belongs in the application layer)
Merchant configuration (that belongs in a configuration service)

Getting these boundaries right is the core design challenge.

Orchestration layer system boundaries

rendering…

The adapter pattern

Each processor integration should be implemented as an adapter that conforms to a common interface. The interface defines the operations the orchestration layer needs: authorize, capture, void, refund, and status check. Each adapter implements these operations for a specific processor, translating between the orchestration layer's internal representation and the processor's API.

The interface design deserves careful thought. It needs to be expressive enough to capture the semantically important differences between processors, while abstract enough to prevent processor-specific details from leaking into the orchestration layer.

A minimal interface for an authorization operation might look like:

interface AuthorizeRequest {
  amount: Money
  paymentMethod: PaymentMethodRef
  merchantId: string
  idempotencyKey: string
  metadata?: Record<string, string>
}

interface AuthorizeResult {
  status: 'approved' | 'declined' | 'error'
  authorizationCode?: string
  declineReason?: DeclineReason
  processorTransactionId: string
  processorResponseCode: string
  riskScore?: number
}

interface ProcessorAdapter {
  authorize(request: AuthorizeRequest): Promise<AuthorizeResult>
  capture(authorizationId: string, amount: Money): Promise<CaptureResult>
  void(authorizationId: string): Promise<VoidResult>
  refund(captureId: string, amount: Money): Promise<RefundResult>
  status(transactionId: string): Promise<TransactionStatus>
}

The DeclineReason type is where the abstraction becomes genuinely difficult. Processors return dozens of decline codes, and mapping them correctly to a unified set of reasons requires understanding the semantics of each code for each processor. A naive mapping loses information; an overly granular one recreates the processor-specific complexity inside the abstraction.

A workable approach is a two-level decline reason structure: a coarse category (issuer declined, insufficient funds, fraud, card problem, network error) plus the raw processor response code. Business logic uses the category; logging and debugging use the raw code.

Routing logic

The orchestration layer needs to decide, for each transaction, which processor to route it to. This decision can be simple (always use the configured primary processor) or complex (route based on transaction amount, card type, merchant category, processor uptime, or cost optimization).

Even if you start with simple routing, design the routing component to be replaceable. Routing requirements grow over time. A POS platform that supports a single processor today will likely support multiple processors tomorrow, and the routing logic will need to evolve.

A routing policy object separates the routing decision from its implementation:

interface RoutingPolicy {
  selectProcessor(
    request: AuthorizeRequest,
    availableProcessors: ProcessorAdapter[],
    processorHealth: Map<string, ProcessorHealth>
  ): ProcessorAdapter
}

The ProcessorHealth type captures real-time information about each processor's availability and performance — error rates, latency percentiles, recent outage events. The routing policy uses this information to avoid sending traffic to degraded processors.

Routing and failover decision flow

rendering…

Transaction state machine

Every payment transaction passes through a sequence of states. Understanding and explicitly modeling these states is essential for building an orchestration layer that handles failures correctly.

Each state transition needs to be persisted atomically. The classic problem in distributed payment systems is the "did it go through?" question — when a network failure occurs during a transaction, the system may not know whether the processor received and processed the request. Idempotency keys and status check operations address this, but only if the state machine is designed to use them correctly.

The orchestration layer should maintain its own transaction record, separate from the processor's. This record is the source of truth for reconciliation and should be written before the processor call is attempted, then updated atomically when a response is received.

Full transaction lifecycle state machine

rendering…

Reconciliation architecture

Reconciliation is the process of verifying that the transactions in your system match the transactions the processor settled. It's unglamorous work, but incorrect reconciliation means merchants receive incorrect payouts, which is a critical failure.

The reconciliation process has several components:

Settlement ingestion. Each processor delivers settlement data in a different format — CSV files via SFTP, JSON webhooks, or API endpoints. The orchestration layer needs an ingestion pipeline for each processor that parses settlement data into a normalized internal format.

Matching. Settlement records need to be matched to orchestration layer transaction records. The primary matching key is usually the processor transaction ID, but this fails for post-authorization adjustments (tip adjustments, partial captures) where the settlement record may not have a direct counterpart in the authorization data.

Discrepancy detection. Matched records need to be checked for amount discrepancies, missing settlements, and unexpected settlements (transactions in the settlement batch with no corresponding authorization in your system). Each category of discrepancy requires a different resolution path.

Reporting. Merchants need a reconciliation report that explains what settled, when, and for how much. The report should surface discrepancies clearly so that merchants can take action.

Reconciliation pipeline

rendering…

The complexity of reconciliation scales with the number of processors and the diversity of transaction types. Start with a simple matching algorithm and robust discrepancy logging, and build towards more automated resolution over time. Trying to automate resolution for all discrepancy types from the start is a path to subtle bugs that are hard to discover.

Operational observability

A payment orchestration layer handles money. When something goes wrong, it needs to be visible immediately.

The minimum observability requirements are:

Transaction-level logging. Every state transition for every transaction should be logged with a timestamp, the previous state, the new state, and enough context to understand why the transition occurred. This log is essential for debugging payment issues in production.

Processor health metrics. Authorization success rates, latency distributions, and error rates per processor should be tracked and exposed for alerting. A sudden drop in the authorization success rate for a specific processor is a signal that something is wrong — either with the processor or with a recent code change.

Queue depth monitoring. For offline queuing, the depth of the pending queue should be monitored. An unexpectedly large queue indicates either an extended connectivity issue or a sync failure that needs investigation.

Reconciliation metrics. Discrepancy rates and unmatched settlement counts should be tracked over time. A rising unmatched rate often indicates a change in how a processor is reporting transactions — an API update that changed field names or identifiers.

The design decisions that matter most

After working through these components, a few design decisions emerge as particularly consequential:

The idempotency model. How you handle idempotency at the orchestration level — what keys you use, how long they're valid, what happens when a duplicate is detected — determines whether your retry logic is safe. Get this wrong and you'll either create duplicate charges or fail to retry transactions that should be retried.

The state persistence model. Whether transaction state is stored in the orchestration layer's database or derived from processor APIs determines how resilient your system is to processor unavailability. Storing state locally is more complex but more resilient.

The failure semantics. When a processor call fails with an ambiguous error — a timeout, a network error, an unexpected response — should the orchestration layer treat this as a decline or as a retriable error? The answer depends on the operation: an authorization timeout might be safe to retry; a capture timeout might not be.

The adapter contract. How strictly does the orchestration layer enforce its adapter interface? If adapters are allowed to return processor-specific data in the response, that data will eventually be used by application code, and the abstraction will erode. Enforcing a strict interface prevents this but may require building richer normalized types.

These are the decisions that shape how the system behaves under pressure, when the happy path assumptions break down. They're worth thinking through carefully before writing the first line of implementation code.

This concludes the four-part series on payment infrastructure for POS systems. The patterns described here are reflected in the design of PayBridge, an architecture concept for a modular payment orchestration layer — you can find that under Projects.