Migrating Payment Processors Without Losing a Single Transaction

A processor migration is rare, painful, and not widely discussed. It's also the largest piece of operational work a payment platform does, and it has more failure modes than the rest of the platform combined. I've been part of three of them, watched two more, and read post-mortems on a handful of public ones. Every time, the same thing: the team underestimates the complexity, overestimates the throughput, and discovers the long tail of edge cases only after committing.

This article is what I wish someone had handed me before the first migration.

Why migrations happen

Teams migrate processors for predictable reasons:

Cost. The current processor's fees are eating margin and a competitor offers better rates. Usually for medium-to-large merchants where 30 basis points compounds into real money.

Reliability. The current processor has too much downtime, too many declines, or too many incidents. The data shows it, the merchants are complaining, and the BD team has been negotiating credits for six months.

Features. The new processor supports a payment method, geography, currency, or risk model the old one doesn't. Often this is a partial migration — keep the old processor for what it's good at, add the new one for what's missing.

Risk. The current processor is being acquired, deprecating an API your business depends on, or has had a public security incident. You don't want to be there when the music stops.

The migration's character changes based on which of these is the driver. Cost-driven migrations move all volume. Reliability-driven migrations may run dual until trust is established. Feature-driven migrations are partial by nature. Risk-driven migrations are usually fast and ugly.

What you're actually moving

The naive view is that you're moving payment volume from processor A to processor B. The actual list is longer:

Customer payment methods (saved cards, bank accounts)
Subscription / recurring payment configurations
Active authorizations that haven't yet captured
Refund eligibility windows on past transactions
Disputes in flight
Settlement reporting historical context
Webhooks pointing at the old processor's URLs
Merchant accounts and underwriting state
Fee structures negotiated per merchant
Risk profiles built up over time
API keys and credentials baked into client code

Each of these has its own migration plan. The naive migration only thinks about the first one and ships when that's done. The merchant calls support a week later because their recurring billing didn't move. Then again because the in-flight disputes got lost. Then again because their custom fee structure didn't carry over.

I've found it useful to write down each migrating component as its own pipeline, with its own readiness criteria. Some can move before the cutover. Some have to move during the cutover. Some only finish moving after the cutover. The dependencies between them matter.

The dual-running pattern

The single most important pattern for a safe migration is dual running. New transactions can route to either processor based on a routing policy. The orchestration layer makes the decision per transaction. Both processors are active simultaneously.

Dual running achieves three things:

Risk reduction. A small percentage of traffic goes to the new processor first. If something is wrong, the blast radius is small.
Comparison. With both processors active, you can measure authorization rates, latency, decline reasons, and cost side by side using your real traffic. Sandbox numbers lie; production numbers don't.
Rollback. If the new processor has a problem, you can shift traffic back to the old one immediately. Without dual running, rollback means a redeploy and a window of downtime.

rendering…

The routing policy is the migration's central control. It needs to be configurable without redeploys, scoped per merchant or per transaction characteristic, and instrumented so you know exactly what percentage went where. Hard-coding the percentage in source is how you end up at three in the morning trying to roll back via emergency deploy.

Token portability

If your customers have saved cards in the old processor, you have a token portability problem. Tokens issued by Processor A do not work with Processor B. To migrate the customer's payment method, you have to do one of:

Re-tokenize at the network level (Visa Token Service, Mastercard MDES) so the same network token works across both processors. Requires both processors to support network tokens. Best option when available.
Bulk-export raw card data from Processor A and re-tokenize with Processor B. Requires PCI scope expansion temporarily, contractual ability to export from A, and re-tokenization tooling. Painful but feasible.
Re-collect cards from customers. Send "please update your payment method" emails. Loses 10–30% of customers depending on how the prompt is framed. Unacceptable for most platforms.

The first option is the right one if you can get it. It requires planning, because both processors must be configured for network tokenization before the migration starts. If your existing tokens are processor-specific (no network tokenization), this is a major reason to fix that before you need to migrate, not during.

For the second option, the export-import process needs careful operational design. The export file is the most sensitive artifact your company will handle during the migration. It lives encrypted, is access-controlled to a small group, has a strict retention policy, and is destroyed after import is verified. Audits of who accessed it are mandatory.

In-flight transactions

A transaction that's been authorized on Processor A but not yet captured can't simply be captured on Processor B. The authorization is held by A. If you migrate during the cutover and try to capture on B, the capture fails — the authorization doesn't exist on B.

Three approaches:

Drain before cutover. Stop new authorizations on A several days before the cutover. Capture or void everything that's outstanding. Then cut over with no in-flight state. Works for platforms with short auth-to-capture windows (most card-not-present). Doesn't work for hospitality, lodging, or anywhere with multi-day holds.

Dual-route by lifecycle stage. Route new authorizations to B, but continue to send captures and refunds for existing authorizations to A. The orchestration layer routes each operation based on which processor handled the original authorization. Requires the orchestration layer to track this, which it should be doing anyway.

Maintain A indefinitely for legacy transactions. Keep A active for refunds against old transactions until the refund window expires (typically 6–18 months). New volume goes entirely to B. After the refund window, decommission A.

The second and third are usually combined. New authorizations go to B; legacy operations against A's transactions stay on A; eventually A is fully drained.

Reconciliation across processors

During dual running, every reconciliation pipeline has to handle both processors. Settlement files arrive from both, in different formats, on different schedules. Your internal records cover both. The reconciliation system has to match each settlement entry to its corresponding internal record, regardless of which processor handled it.

This is where pre-migration architecture investments pay off enormously. If the orchestration layer was designed with multiple processors in mind from the start, reconciliation is just another adapter. If the orchestration layer was processor-specific, reconciliation has to be rebuilt during the migration, which is the worst possible time.

The most common bug I've seen during migrations: reconciliation runs cleanly for processor A traffic and processor B traffic separately, but a transaction that started on A and was refunded on B (rare but happens) doesn't reconcile against either. The refund record is on B, the authorization on A, and the matching logic doesn't cross processors. These transactions sit in an unmatched state forever unless someone explicitly reconciles them manually.

Plan for this. Build a cross-processor reconciliation report that finds transactions whose records are split. Run it weekly during dual running. Have someone responsible for resolving each unmatched entry.

The certification dance, again

Every new processor requires its own certification. You've been here before — the test cases, the artificial scenarios, the back-and-forth with the processor's certification team. Budget six to twelve weeks for it, with the engineering team's bandwidth half-consumed during the cycle.

Don't try to compress this. Certification timelines are mostly driven by the processor's queue, not your readiness. Submitting a cleaner first attempt doesn't change the calendar much; it just changes how many revisions you go through.

What you can do: certify before you migrate. Get the new processor through certification entirely on its own merits, in your test environment, before you start routing real traffic. The certification environment is also where you discover the long tail of behaviors you'll need to handle in production. Better to find them in cert than in canary.

Risk and fraud profile transfer

Processors build risk and fraud profiles over time. The old processor knows your merchant patterns, has tuned its rules, and is making decisions based on years of data. The new processor sees you as a stranger.

Authorization rates almost always drop when you migrate. The new processor is more conservative, more cautious, more likely to decline marginal transactions. This is normal. Expect a 1–3 percentage point drop in authorization rate during the first month, recovering over three to six months as the processor builds your profile.

If you don't measure this carefully, you'll attribute the drop to "the migration is broken" when really it's "the migration worked, but the new processor needs time." If you communicate it to your merchants without context, they'll panic. The migration plan should include explicit communication about expected near-term changes in approval rate.

You can shorten this by sharing risk data with the new processor up front: known good customers, fraud-flagged ones, average ticket sizes, transaction patterns. Some processors accept this; some don't. Where they do, it's worth the effort.

The cutover itself

The actual cutover — the moment when traffic shifts from A to B as the primary processor — should be a non-event. By the time it happens, you've been dual running with most traffic on B for weeks. The cutover is just flipping the routing percentage from 95% to 100%.

Things that go wrong if you skip dual running and try a hard cutover:

Authorization rate is lower than expected, surfaces only after merchants complain.
Settlement reporting differs in format you didn't anticipate, breaks the reconciliation pipeline.
A processor-specific edge case fires for the first time in production, no playbook exists.
Webhook timing is different, downstream consumers fall behind.
Customer support tickets spike, no one knows whether it's expected.

A staged cutover with measurement at each stage prevents most of these. The migration goes from a high-stakes event to a series of small risk-managed adjustments.

Communicate, more than you think necessary

Merchants need to know what's happening, when, and what they need to do. The communication plan has phases:

Pre-announcement (weeks before): "We're moving to a new payment processor for these reasons. Your service continues uninterrupted. Here's what to expect."
Migration window: "We're in the middle of moving. You may see slight differences in [decline rates, settlement timing, statement formatting]. Reach out if anything looks wrong."
Post-migration verification: "Migration complete. Please verify your reports match what you expected. Here's a comparison checklist."

The temptation is to skip this and migrate quietly, hoping nobody notices. Some merchants don't notice — they have low volume or aren't watching closely. Others notice everything: changed identifiers in settlement reports, slightly different timing on funding, a new line item in fees. They call support; support doesn't know about the migration; support gives wrong answers; merchants lose trust.

Tell merchants what's happening. Tell them what to verify. Acknowledge that they may see anomalies. Make support aware of the migration so they can answer questions correctly.

After the migration

The migration isn't done at cutover. It's done six months later, when the old processor has been fully decommissioned. The interim period has its own work:

Maintain the old processor for refund processing.
Resolve cross-processor reconciliation exceptions.
Decommission credentials, webhook endpoints, and infrastructure tied to the old processor.
Update documentation and runbooks.
Audit that no internal code paths still assume the old processor.
Conduct a retrospective.

The retrospective is the most important. Migrations are rare enough that institutional memory of them is fragile. Writing down what went well, what went poorly, and what you'd do differently is what makes the next migration cheaper. Without this, the next migration is the first migration again, with different people repeating the same mistakes.

What I'd build next time

If I'm starting a new payment platform today, I build for migration on day one. That means:

All processor integrations behind a uniform abstraction layer with clean adapter boundaries.
Network tokenization for stored payment methods, never processor-specific tokens.
Routing policy as configuration, not code, with per-merchant and per-transaction granularity.
Reconciliation that handles multiple processors symmetrically.
Settlement file ingestion that's processor-agnostic at the storage layer.
Webhook handling that survives processor swaps.
Monitoring that breaks down by processor for every metric that matters.

Each of these adds upfront cost. None of them pays back until you migrate. And then they pay back enormously, because the migration that would otherwise consume a quarter of engineering capacity for six months becomes a series of routine operational changes.

The platforms that get this wrong are the ones that built for a single processor, and now every migration is a rebuild. The platforms that get it right have processor diversity as a default capability.

The hardest part

The hardest part of a processor migration isn't the technical work. It's the political work. The current processor doesn't want you to leave; the new processor wants the volume immediately; the BD team wants the deal closed; the engineering team wants more time; the merchants want zero disruption; the executives want a clean story.

These pressures pull in different directions, and the migration plan has to navigate all of them. The technical plan can be perfect, and the migration can still go badly because someone made a commitment that didn't account for engineering reality, or someone delayed a decision that needed to be made early.

What I've learned: have one person who owns the migration end-to-end, with the authority to push back on commitments and the technical depth to make architectural calls. Without this, the migration becomes a committee, and committees do not do well at high-stakes operational work with strict sequencing.

A migration is a project with a definite end. Treat it as such. Staff it intentionally, run it like an incident-prevention exercise rather than a feature, and accept that it will absorb more time than you planned. It always does. The teams that finish their migrations on schedule with zero data loss are the ones who treated the schedule as a hypothesis, not a commitment.

This is part of a series on payment systems architecture. See also tokenization beyond PCI and the real cost of payment integration nobody talks about.