Resolving an Invoice Number Collision in an E-Invoice Integration
Two workers produced the same invoice number in production. A log of how I resolved the race condition vs. legal gap dilemma in a real project.
A few months back, while working on an e-invoice integration, I hit a short but irritating error in production: the official integrator was returning Duplicate / Already Exists. Looking at the logs, the picture was clear — two separate workers had generated the same invoice number at the same second, both had submitted with that number, and one had been rejected. Invoice issuance for that series was blocked, and every job behind it was waiting.
The error actually surfaced at the very end of an architectural migration. Over several months, I had moved invoicing from a synchronous setup — where the user hits a button and waits for the result on screen — to an asynchronous one: different projects and frameworks write to a shared RabbitMQ queue, and a dedicated consumer/worker microservice continuously listens and processes that queue. The migration went exactly as I had planned — scalability and resilience were solid. This number collision appeared right at the end of that process, a small wrinkle easy to miss: it turned out that in the synchronous model, the user waiting had accidentally serialized the work. Once the consumer ran concurrently, that accidental serialization was gone, and a race condition that had been hiding in plain sight for years finally surfaced.
This post is a log of that day and the decisions that followed. I wrote up the deep architectural side separately (link below); here I want to talk about the “what did this feel like in a real project, and which traps did I step on” side.
The problem was actually two problems
At first glance it looked like a classic race condition. The invoice number was being generated at submission time with SELECT MAX(invoice_no) + 1. This approach had worked flawlessly with a single worker for years, but it breaks the moment parallel workers start processing the same series concurrently: both read the same value from MAX(), and both go to the external API with the same number.
My first instinct was the same as most developers’: “generate the number upfront, when the message is queued, and the collision goes away.” As I started implementing it, I realized this solves the race while creating a worse problem. In e-invoicing, a number must not only be free of duplicates — it must also be free of gaps. The legal requirement is that no number can be skipped in a series. If I generate the number at the start, and an invoice sitting in the queue later fails at the integrator or gets cancelled by the user, I’m left with an unaccounted-for gap in the series.
So I was holding a double-edged blade: generate too late and you get a race; generate too early and you get a gap. The challenge wasn’t swapping one approach for the other — it was satisfying both constraints at once.
The essence of the solution: reserve the number just in time
The starting point was this: generate and persist the number neither too early nor too late — exactly one moment before the external API call. I added a reserved_no column to the invoice; once a number is reserved, it is never deleted even if the external call fails — it stays attached to the invoice and the retry uses that same number. The race side is handled by a central, atomic counter.
Holding the lock only for the reservation and committing immediately — that is, keeping the external call outside the transaction — was also the key to performance. Had I done it wrong and held the lock while the external call was in flight, the system would have become a bottleneck.
I’m not going into the depth of this pattern here — lock window, early commit, hybrid triggering, the dual-write problem, and orphan recovery — I covered that side with code and diagrams in a separate post:
Race Condition and Gap in Sequential Number Generation: JIT Reservation → (sade.dev)
Three things I noted in the log
The takeaways from this that I’ll carry into the next similar problem are mostly on the decision side, not the code side:
A legal constraint is an architectural constraint. “Gap-free numbering” initially looked like an accounting detail; it turned out to be the dominant force shaping the entire system design. There are dozens of approaches that solve a race condition on their own — but the moment you add “and no gaps either,” the solution space narrows drastically. Before filing a requirement under “business rule” and moving on, it’s worth asking how it constrains the architecture.
Every step that touches the outside world can fail. Scenarios like the integrator call succeeding but the network dying during the local write are not “if” situations — they’re “when.” Building the design on the assumption that “requests can come again, calls can be interrupted mid-flight” is another face of the same discipline I described when writing about idempotency in APIs. Sealing the number before the call was precisely what made it possible to later ask “did this invoice actually go through?”
Measure first, then assume. The shortcut “numbers are colliding, so we need locking” was only half right. The real question was where and for how long to hold the lock; if I had designed that without measuring the logs and the actual behavior, I would have built a system that worked but choked under load.
In the end, generating sequential, gap-free numbers turned out to be less of a locking problem and more of a timing problem: sealing the state exactly one moment before the outside world sees it. Everything else fell into place around that single decision.
Comments
Sign in with your GitHub account to join the discussion. Comments are stored in GitHub Discussions.