2.78 Crore Tickets in 13 Days. Here's What Happens to Your App Under That Load.
One year before Dhurandhar 2, Coldplay crashed BookMyShow. Same platform, same kind of demand, completely different outcome. The difference wasn't budget. It was five architecture decisions made before the spike, not after it.

Dhurandhar 2 released on March 19, 2026.
In 13 days, 2.78 crore tickets were sold. At peak, BookMyShow was processing around 40,000+ bookings per hour. On opening day alone, Day 0 and Day 1 combined roughly 50 lakh people walked through cinema doors across India.
That is not a traffic spike. That is a sustained, multi-week assault on infrastructure that most apps are not built to survive.
Now forget Dhurandhar 2 for a moment. Think about your product.
Maybe you're building a booking platform. A D2C brand with flash sales. A marketplace with a limited-quantity drop. An event app. A food ordering platform during IPL finals. The specific context changes the underlying engineering problem doesn't.
What happens to your app when everyone shows up at once?
This post is the honest answer to that question. Not theory, the actual architectural decisions that determine whether your app handles the surge or becomes another screenshot in a frustrated customer's Twitter thread.
The thundering herd and why it's not just a traffic problem
Before we get into solutions, it's worth understanding exactly what breaks.
The Coldplay concert ticket sale in September 2024 is the best-documented case study for Indian apps under extreme load. BookMyShow had 13 million simultaneous users the moment tickets went live. The platform crashed in minutes. Users were logged out mid-session, OTPs never arrived, queues showed "6 lakh people ahead of you" and froze there.
This is called the thundering herd problem: a massive number of users all waiting for the same trigger simultaneously hammer your infrastructure the instant that trigger fires. Your load balancer gets hit from all directions at once. Your database receives thousands of concurrent read and write requests for the same rows. Your authentication service, which handles maybe a few hundred OTP requests per minute on a normal day, suddenly needs to process tens of thousands per minute.
No single component fails in isolation. They cascade. The database slows down under concurrent writes, which causes the API to time out, which causes the load balancer to retry, which adds more load to the already-struggling database. Within minutes the whole system is down.
The instinct is to say "just add more servers." That's part of the answer but only if your architecture is designed to use them. A poorly designed system with ten servers crashes just as completely as a poorly designed system with one. The servers just cost more.
What actually determines whether you survive a traffic spike is architecture, the decisions made about how your data flows, how your components communicate, and what happens when things get slow.
The five engineering decisions that separate apps that survive from apps that crash
1. How you handle seat locking “the double booking problem”
In any booking system, the most dangerous moment is the millisecond between "user selects a seat" and "payment confirmed." During that window, multiple users might be looking at the same seat simultaneously, all seeing it as available.
Without an explicit locking mechanism, two users complete payment for the same seat at the same time. Both get a confirmation. One of them is wrong. You now have a customer service disaster.
The standard solution is Redis-based temporary locking. When a user selects a seat, your system immediately places a lock on that record in Redis with a TTL, a time-to-live, typically 5–10 minutes. During that window, the seat appears unavailable to all other users. If the first user completes payment, the lock converts to a permanent booking. If the user abandons checkout or the TTL expires, the lock releases and the seat goes back into the pool.
This happens in-memory, in microseconds, without touching your primary database. Under high concurrency, this matters enormously as you're doing millions of availability checks against an in-memory store rather than hammering your relational database with read queries every time a user views the seat map.
The failure mode without this: race conditions. The failure mode with a poorly configured TTL: seats get permanently locked by users who abandoned checkout, visible inventory drops, bookings are lost. Both are expensive problems.
2. Read replicas: because reads and writes should never compete
During a traffic spike, the load on your database is not symmetrical. The overwhelming majority of requests are reads: show me available seats, what's the price, is this show sold out. Writes requests are actual bookings they are a much smaller fraction.
If all of this flows through a single database instance, your writes (which require locks and are slow) compete with your reads (which are fast but numerous). Under high concurrency, everything slows down together.
The solution is read replicas: separate database instances that receive a copy of your primary database and serve all read traffic. Your primary handles writes only. Reads go to replicas.
During a Dhurandhar-level event, 95% of traffic is "is this show available / show me the seat map / what's the price" they all are read requests. That traffic never touches your primary database at all. The write request load are actual bookings & this is a fraction of that, handled by the primary in relative calm.
The nuance: replicas introduce replication lag, typically milliseconds but occasionally longer under extreme write load. For availability checks, stale data by 100ms is acceptable. For payment confirmation, you always read from primary. Knowing which queries can tolerate eventual consistency and which cannot is the architectural decision that matters.
3. Queue architecture: the virtual waiting room
One of the smarter things modern ticketing platforms do under extreme load is not let all users hit the booking system simultaneously at all. Instead, users who arrive during peak demand enter a virtual queue. They see their position, an estimated wait time, and a page they can leave open while the system works through the line.
This sounds simple. The engineering underneath is not.
A well-built queue system needs to be stateful, lets say if a user closes the tab and reopens it, they should return to their same position, not start over. It needs to be fair FIFO, no one jumping the line. It needs to update in real time as positions change. And it needs to handle the case where a user at position 50,000 is told they'll wait 40 minutes and if they simply close the tab, their slot needs to expire and the queue needs to compact correctly.
The practical implementation typically uses Redis or a purpose-built queue like BullMQ. Each queued user gets a token with a timestamp. A worker processes tokens in order, granting booking access in batches. When access is granted, the user's token unlocks the booking flow with a time window, typically of 10 minutes before it expires.
The failure mode without a queue: all users hit the booking endpoint simultaneously, the database collapses under concurrent writes, and nobody can book. The failure mode with a queue: the system feels slow, but it works. Users wait their turn, complete booking, and move on. Your infrastructure processes load at a pace it can actually handle.
4. Auto-scaling: but configured before the spike, not during it
Cloud infrastructure like AWS, GCP, or even managed providers like Railway or Render, all supports auto-scaling: the ability to spin up additional compute instances automatically when load increases beyond a threshold.
This sounds like the obvious solution to traffic spikes. The problem is response time. Spinning up a new server instance takes 2–5 minutes. A traffic spike from Dhurandhar 2 bookings goes from normal to 45,000+ transactions per hour in seconds. By the time your auto-scaler has provisioned new instances, your existing infrastructure has already been under maximum load for several minutes.
The correct approach: pre-scaling. For predictable high-traffic events like a movie release, a sale launch, an IPL final, you scale up before the event, not in response to it. If you know bookings open at 10am, you have your target server count running by 9:45am. Auto-scaling then acts as a safety net for traffic beyond your projections, not your primary response mechanism.
For unpredictable spikes, auto-scaling helps but requires your architecture to be horizontally scalable in the first place, your app servers need to be stateless (no local session storage, no in-memory state that can't be replicated), and your database layer needs to handle the connection pool from multiple instances without falling over.
5. Graceful degradation: deciding what breaks first
The question most founders never ask: if your app is overwhelmed, which parts of it are allowed to degrade?
Under high load, you have a choice: the whole system slows down uniformly, or you deliberately disable less critical features to protect the core flow. The second option is graceful degradation and it's the difference between "checkout was slow but it worked" and "the entire site was down."
For a booking app, the core flow is: search → seat selection → payment → confirmation. Everything else is secondary. Recommendations, reviews, social features, loyalty point calculations, marketing banners all of these should be the first things turned off, under extreme load, via feature flags, so the core flow stays fast.
At the infrastructure level, this means rate limiting on non-critical API endpoints. A user trying to load their booking history while 50,000 other users are trying to complete checkout should be deprioritised. That request can wait. The checkout cannot.
The failure mode without this thinking: a slow recommendation engine holds up a database connection that was needed for a booking transaction. A marketing banner call to a third-party API times out and causes the page render to hang. The booking flow fails not because of its own load but because of something tangential to it.
What this means for your app specifically
Most products building on modern stacks such as Next.js, Convex, Supabase, Firebase, they all start with infrastructure that handles normal load comfortably. The gap appears when load is abnormal.
A few honest observations:
Firebase's Firestore is excellent for real-time data and handles moderate concurrency well. Under extreme write load when thousands of simultaneous bookings are happening an unoptimised queries can cause costs to spike dramatically and performance to degrade. One documented case: a team went from $50 to $5,000/month in Firebase costs because of missing composite indexes that only became obvious under real traffic. Firestore works well for high-traffic booking systems, but only if the data model is designed for it from the start.
Supabase/Postgres handles high concurrency well when properly configured with connection pooling (PgBouncer), read replicas, and appropriate indexing. Without those, a Postgres instance will reach its connection limit and start refusing requests under the kind of concurrency a ticket booking surge produces.
Vercel serverless functions have execution time limits and connection pool constraints that make them poorly suited for the database-heavy operations in a booking flow under high concurrency. Functions that work perfectly at 10 concurrent users start failing at 1,000. Long-running operations like payment processing, complex availability queries they all belong in a persistent backend, not a serverless function.
Redis is not optional for any serious booking system. Seat locking, queue state, session management, rate limiting, all of this belongs in Redis, not in your primary database. The good news: Redis is cheap and easy to add. The common mistake: adding it as an afterthought when the double-booking problems have already reached production.
The real question for founders
If your product has any of these characteristics, the architecture questions in this post are not theoretical:
- Users competing for limited inventory (seats, slots, limited-edition products)
- Predictable high-demand moments (launches, sales, IPL bookings, flash deals)
- Payments under concurrent load
- Real-time availability that multiple users see simultaneously
The difference between building this correctly from the start and retrofitting it after a production incident is significant, not just in cost, but in the reputation damage that happens when your app crashes at the exact moment your users care most.
BookMyShow survived Dhurandhar 2 at scale. They had learned from Coldplay. The lesson isn't that you need BookMyShow's infrastructure budget, it's that the architectural decisions that made it work are available to any product built with the right foundation.
Where BuildOrbit fits
We build backends that are designed for production load from day one & not after the first incident.
If you're building a booking platform, marketplace, or any product with high-concurrency requirements, the conversation we have before writing a single line of code covers exactly what this post is about: how does your data model handle concurrent writes, where does seat locking live, what degrades gracefully under load, and how do you scale before the spike, not after it.
If your current architecture has any of the gaps described above and you're approaching a launch or a high-demand moment, it's worth an honest review before the event, not after.
Our Tech Stack Recommender gives you a directional read on whether your current setup is appropriate for your expected load, it is free & only takes two minutes.
→ Try the Tech Stack Recommender
For a more detailed conversation: rahul@habitize.app

Rahul Shitole
Founder
Rahul Shitole is the founder of BuildOrbit Studio and the co-founder of Habitize, an AI-powered emotional wellness platform. With 8+ years building production software across mental health, healthcare, agri-tech, and B2B SaaS and two startups shipped from zero, he knows what it actually takes to go from idea to live product. He started BuildOrbit to give other founders access to the kind of engineering partner he always wished he'd had. He writes about what he's learned the hard way.