Scaling Cloud Deployments

Sajawal Khan Sadozai

Most cloud architecture advice is written for companies with dedicated DevOps teams, unlimited AWS credits, and the luxury of rebuilding infrastructure before it breaks. Our clients don't have that. They're startups and growing businesses that need infrastructure that works from day one, scales without emergency intervention at 3am, and doesn't cost more than the product earns. Here is what we've actually learned across dozens of cloud deployments — including the expensive mistakes we made early on.

The Problem With "We'll Scale Later"

The most dangerous sentence in early-stage software development is "we'll fix the infrastructure when we need to." We've inherited projects where this philosophy was applied, and the cost of fixing it mid-growth is almost always higher than building it right the first time.

A product that works fine at 500 users can collapse at 5,000 — not because the code is bad, but because the infrastructure was never designed to handle concurrent load, database connections, or burst traffic. We've seen this happen with a FinTech client whose transaction processing ground to a halt when they ran a promotion and traffic spiked 40x in an hour. The product worked perfectly in testing. The infrastructure just wasn't built for that moment.

The right time to design for scale is before you need it — not because you need to over-engineer from day one, but because the foundational decisions you make early determine how expensive scaling becomes later.

The Foundation: What We Always Set Up First

Before writing any application infrastructure, we establish a baseline that every project gets regardless of size. These are non-negotiables:

Separate environments from day one — Development, staging, and production are three separate AWS accounts or projects, not three folders in the same account. Cross-environment contamination is one of the most common causes of production incidents we've seen.
Infrastructure as Code from the start — We use Terraform for all cloud resources. If it can't be reproduced from code, it doesn't exist in our infrastructure. This means any environment can be torn down and rebuilt in under 20 minutes, and there are no mystery resources that someone created manually 18 months ago.
Centralised logging before anything goes live — CloudWatch Logs structured with consistent JSON formatting, feeding into a log aggregation layer. When something breaks in production — and something always breaks — you need logs from before the incident, not just after.
Alerting on business metrics, not just system metrics — CPU at 80% is interesting. Transaction failure rate above 0.5% is a crisis. We set up alarms on the metrics that actually map to user impact from the start.
Database backups with tested restores — Automated daily backups are table stakes. What most teams skip is testing the restore. We run quarterly restore drills. A backup you've never restored is a backup you can't trust.

The Architecture That Survives Growth

Across our most successful scaling projects, a consistent architecture pattern has emerged. It's not the most elegant distributed system you'll find in a textbook, but it's practical, cost-effective, and handles the 0-to-1M journey without a full rebuild:

Application layer on ECS Fargate or Lambda — Containerised services on Fargate for workloads that need consistent compute, Lambda for event-driven and sporadic workloads. Both autoscale without managing EC2 instances. We stopped managing EC2 fleets for application servers three years ago and haven't looked back.
RDS PostgreSQL with read replicas — A single primary database for writes, with read replicas that application code uses for all read operations. This single change has resolved database bottlenecks on more projects than any other optimisation we've made.
ElastiCache Redis for session and hot data — Anything that's read frequently but changes infrequently lives in Redis. User sessions, feature flags, rate limit counters, cached API responses. Database load drops dramatically once you identify and cache the right data.
S3 + CloudFront for all static assets — No application server should ever serve an image, a CSS file, or a JavaScript bundle. S3 storage costs are negligible. CloudFront delivers assets from edge locations globally with sub-50ms latency. This is one of the cheapest and highest-impact performance improvements available.
SQS queues for async work — Any operation that doesn't need to complete synchronously — sending emails, processing images, generating reports, triggering webhooks — goes into a queue. The application responds immediately to the user. The work happens in the background. This is the difference between a 200ms response and a 4-second timeout when traffic spikes.

The Database Mistakes That Killed Performance

The database is where most scaling failures actually originate. Here are the specific mistakes we've found and fixed most often:

No connection pooling — PostgreSQL has a hard limit on concurrent connections. At high traffic, applications without connection pooling exhaust this limit and start refusing connections. We now always deploy PgBouncer or use RDS Proxy in front of every production database.
Missing indexes on foreign keys and filter columns — A query that takes 2ms at 10,000 rows takes 4 seconds at 10 million rows without proper indexing. We run EXPLAIN ANALYZE on every query that touches more than one table before it goes to production.
N+1 queries in ORMs — The most common performance killer in applications built with ORMs like Prisma, Sequelize, or Django ORM. Loading a list of 100 orders and then fetching the customer for each one generates 101 database queries instead of 1. We audit ORM query patterns on every project before launch.
Long-running transactions blocking writes — A database transaction that holds locks while waiting for an external API call can block all writes to that table. We keep transactions as short as possible and never make external calls inside a transaction.
No query timeout configuration — Without statement timeouts, a slow query can hold a database connection indefinitely, cascading into connection exhaustion. We set statement_timeout at the application level for every connection.

Handling Traffic Spikes Without Panic

Every product eventually has an unplanned traffic spike — a viral social post, a press mention, a marketing campaign that performs better than expected. The difference between a spike being a celebration and a crisis is whether your infrastructure was designed for it.

Our approach to spike resilience:

Autoscaling with pre-warming — ECS and Lambda autoscale automatically, but cold starts and scaling lag can cause brief degradation at the very start of a spike. We configure scheduled scaling rules that pre-warm capacity before known high-traffic events — launches, campaigns, scheduled reports.
Rate limiting at the API gateway level — Before a request reaches your application, AWS API Gateway or an Application Load Balancer with WAF rules can enforce rate limits per IP, per user, and per endpoint. Abusive traffic patterns are blocked before they touch your application servers.
Circuit breakers on external dependencies — If a third-party payment gateway or SMS provider slows down under load, your application shouldn't cascade that slowness to every user. Circuit breakers detect failing dependencies and fail fast instead of timing out slowly.
Graceful degradation — Critical paths (checkout, login, core functionality) continue working even when non-critical services (recommendations, analytics, notifications) are degraded. We explicitly design which features can be disabled under load and build kill switches for them.

Cost Optimisation Without Sacrificing Reliability

Cloud costs have surprised more than one of our clients. AWS bills can grow faster than revenue if you're not actively managing them. The patterns that have saved our clients the most money:

Right-size before you reserve — We see teams buy Reserved Instances before they understand their actual usage patterns. Run on On-Demand for 30 days, profile actual utilisation, then purchase reservations for the compute you actually use. Reservations on the wrong instance type save nothing.
Spot Instances for background processing — Batch jobs, report generation, image processing, ML inference — any work that can tolerate interruption should run on Spot Instances, which cost 60–80% less than On-Demand.
S3 lifecycle policies — Data that isn't accessed after 30 days moves to S3 Infrequent Access. After 90 days it moves to Glacier. Most applications generate far more data than they actively use, and storing all of it in standard S3 is unnecessarily expensive.
CloudFront caching aggressively — Every cache hit at the CDN layer is a request that never reaches your origin servers. Properly configured cache headers for static assets and semi-static API responses have reduced origin traffic by 60–70% on several of our projects.
Tag everything, budget everything — AWS Cost Explorer is only useful if resources are tagged consistently. We tag every resource with project, environment, and team from day one, and set billing alarms at 80% of budget thresholds. No surprises at month end.

The Monitoring Stack We Use

You cannot improve what you cannot measure. Our standard observability setup:

CloudWatch for infrastructure metrics — CPU, memory, network, database connections, Lambda duration and error rates. Custom metrics for application-level events pushed via CloudWatch SDK.
Structured JSON logging — Every log line is a JSON object with consistent fields: timestamp, level, service, request ID, user ID (hashed), and message. This makes log queries in CloudWatch Insights fast and reliable.
Distributed tracing with X-Ray — For microservices and multi-step request flows, X-Ray traces show exactly where latency comes from. A request that takes 800ms end-to-end — X-Ray shows you that 600ms of that is a single downstream API call you didn't realise was slow.
Uptime monitoring from outside your network — Internal health checks tell you if the service is running. External uptime monitoring from services like Betterstack or Checkly tells you if users can actually reach it. These are different things and both matter.

What 1 Million Users Actually Looks Like

One million users sounds like an enormous number. In terms of infrastructure load, it's more manageable than most people expect — if you've built correctly. A product with 1M registered users typically has 2–5% daily active users, which is 20,000–50,000 concurrent sessions distributed across 18 hours of peak usage. That translates to roughly 300–700 requests per second at peak for a typical consumer app.

This is well within the capability of a properly configured Fargate cluster, a read-replica PostgreSQL setup, and a Redis cache layer. The cost at this scale on AWS is typically $1,500–$4,000 per month depending on data storage and egress — not the six-figure infrastructure bill that the phrase "1 million users" might imply.

The teams that struggle to scale to this number are rarely limited by cloud capacity. They're limited by database queries that were never optimised, application code that makes blocking calls where it should be async, or infrastructure that was never designed to run as more than a single instance.

Three Things We'd Do Differently

Looking back across our cloud work, three decisions consistently show up as things we wish we had done earlier:

Implement observability from day one, not week six. We've spent more time debugging production incidents on projects where logging was an afterthought than on anything else. Good logs and metrics are cheaper to build before launch than after an incident.
Use managed services more aggressively. Early in our practice, we operated our own Redis clusters, our own RabbitMQ instances, our own SMTP servers. Every one of those is now a managed service. The operational overhead of running infrastructure that AWS or a specialist SaaS can run better than us is time we should be spending on client products.
Run load tests before every major release, not just before launch. A feature that works fine at current scale can destroy performance at 2x scale if it introduces an N+1 query or a missing index. Load testing is a regression test for infrastructure, not just a launch checklist item.

Dealing with infrastructure that wasn't built to scale? Or starting fresh and want to get it right the first time? Our cloud and DevOps team would be happy to take a look.

Scaling From 0 to 1M Users: Lessons From Our Cloud Deployments