Why Is My AWS Bill So High? Hidden Costs Explained

FinOps

You've looked at your top services. EC2, RDS, EKS, maybe Bedrock. The numbers there make sense. But the total is still higher than it should be, and you're left thinking, "Why is my AWS bill so high?"

This is a common situation. AWS environments accumulate cost in dozens of places that don't show up prominently in the bill. Some of it is configuration drift. Some is services nobody thought to revisit. Some is pricing behavior that's easy to misunderstand until you've seen it up close.

Below is a rundown of the categories where we find hidden spend most consistently when we do a cloud audit on a customer environment.

Storage that multiplied when nobody was watching

S3 is the most common place for quiet storage growth, and the culprits usually aren't large individual files.

Abandoned multipart uploads are one of the most overlooked. When a large file upload fails or is interrupted, the uploaded parts stay in S3 and accumulate storage charges indefinitely. There's no automatic cleanup unless you've configured a lifecycle policy to handle it. In environments without that policy, it's not unusual to find hundreds of gigabytes of orphaned upload data.

Versioning is another. Enabling versioning on a bucket means every overwritten or deleted object creates a new version rather than replacing the old one. A 1 GB log file overwritten daily quietly becomes a 30 GB problem by end of month. The original file looks fine; the version history is the issue.

Cross-region replication doubles your storage cost for every bucket you replicate, plus adds data transfer charges. That's sometimes the right tradeoff for compliance or disaster recovery, but it's often configured once and never reviewed.

Glacier retrieval costs catch teams off guard when backup software or misconfigured lifecycle policies trigger unexpected restores. Expedited retrieval from Glacier is priced at $0.03/GB plus a per-request fee. If something is triggering retrievals you didn't plan for, it shows up fast.

A hand holding an illustrated cloud with arrows representing data upload and download

Data transfer: the line item that surprises everyone

Data transfer is the most common source of unexplained spend we see, and it compounds in ways that aren't obvious from the pricing page.

Inter-AZ traffic isn't free. Traffic between availability zones is charged at $0.01/GB in each direction, meaning a round trip costs $0.02/GB. In architectures where a load balancer in one AZ is routing to EC2 instances or RDS in a different AZ, or where EKS pods communicate across zones, that meter runs continuously. We've seen environments where inter-AZ fees exceeded compute costs for the same workload.

NAT Gateway is the other major one. In us-east-1, a NAT Gateway costs $0.045/GB processed plus $0.045/hour just for being provisioned, roughly $33/month before any traffic. Data going out to the internet adds another $0.09/GB on top of that, bringing the all-in cost to around $0.135/GB for internet-bound traffic. A high-availability setup with one gateway per AZ triples the hourly cost to roughly $99/month before data charges. Routing S3 and DynamoDB traffic through free Gateway VPC Endpoints instead of NAT is one of the faster wins available, since that traffic bypasses the per-GB processing charge entirely.

CloudFront origin fetch costs are another common surprise. If cache TTLs are set too short or a high proportion of content is dynamic, CloudFront pulls from your origin frequently and those requests add up in ways the CDN savings don't fully offset.

Observability costs: often the second-biggest surprise after compute

CloudWatch is pervasive in AWS environments and the costs are easy to underestimate. A few specific patterns generate most of the spend.

Log retention defaults to "never expire" unless you explicitly configure it. Multi-TB log buckets accumulate quietly. Setting retention policies on log groups is a one-time fix that tends to produce immediate savings.

Custom metrics are charged per metric per month, every month. High-cardinality usage (where labels or dimensions create many unique metric combinations) can generate thousands of distinct metrics from a small number of services, and the per-metric charge compounds accordingly.

Chatty microservices with verbose JSON logging generate large ingestion volumes. CloudWatch charges for log ingestion, not just storage, so a service logging full request and response payloads at high volume can generate surprising costs independent of how long the logs are kept.

If you're pushing logs to multiple destinations simultaneously (Datadog and CloudWatch, for instance, or CloudWatch and Splunk), you're often paying the ingestion cost multiple times for the same data. Consolidating to a single destination or routing selectively reduces this without losing coverage.

A pink illustrated cloud with a separate fluffy cloud drifting away from it

Orphaned resources: AWS never cleans up after you

This one doesn't have a clever mechanism behind it. AWS simply doesn't delete things you're no longer using, and environments accumulate artifacts over time.

EBS snapshots from automated backup policies are the most common. Snapshots are incremental but they still cost $0.05/GB-month, and environments with years of daily snapshots and no retention policy can accumulate significant storage. The same applies to AMIs created during deployment pipelines that were never deregistered, and old ECR container images that predate current retention policies.

Elastic IPs not attached to a running instance now cost $0.005/hour each, which is about $3.60/month per idle IP after the public IPv4 pricing change that took effect in February 2024. It's a small number per IP, but it adds up across environments with many of them.

Other commonly forgotten resources: ElastiCache clusters created for load testing, OpenSearch domains provisioned for a project that ended, SageMaker notebook instances left running, DynamoDB tables from old test environments, and Redshift clusters that were paused rather than deleted.

Supporting services that scale unexpectedly

Application Load Balancers in us-east-1 start at around $0.0225/hour, roughly $16-18/month just for being provisioned. That's before LCU charges based on connections and data processed. In environments with many microservices or teams that spin up individual ALBs per service, the count grows quickly. Consolidating multiple services behind a single ALB using path-based or host-based routing is often possible and eliminates redundant hourly charges.

SQS and SNS costs scale with request volume. Retry storms (where consumers fail to process messages and the messages become visible again repeatedly) can inflate request counts substantially beyond what the underlying traffic volume would suggest.

Step Functions charge per state transition. A workflow with many steps, called at high frequency, can cost significantly more than a simpler implementation. We've seen poorly designed workflows cost 20-30 times more than equivalent alternatives.

DynamoDB in provisioned capacity mode charges for read and write capacity units whether they're used or not. Tables provisioned for peak traffic that never materialized, or tables that have scaled up and never scaled back down, are a consistent source of waste.

3d render of an abstract blocks landscape with shallow depth of field

Kubernetes and ECS infrastructure overhead

Even well-managed container environments carry overhead costs that don't show up obviously in compute.

Every EKS cluster incurs a $0.10/hour control plane charge ($73/month) regardless of the number of nodes or workloads running. Dev and test clusters that could share an environment with production (through namespaces and RBAC) or simply be deleted when not in use are common sources of unnecessary spend.

Persistent volumes (EBS-backed PVCs) survive pod deletion by default. Applications that spin up and tear down frequently can leave behind volumes that accumulate storage charges indefinitely. Adding retention policies or using dynamic provisioning with appropriate reclaim policies prevents this.

Cross-AZ pod communication in EKS generates inter-AZ transfer charges at the same $0.01/GB rate mentioned above. Topology-aware routing, which keeps traffic within the same AZ where possible, reduces this without changing application behavior.

IAM misconfiguration as a cost driver

This one is less obvious but comes up more than you'd expect. Overly permissive IAM roles can allow services to scale themselves beyond what was intended.

Lambda functions without reserved concurrency limits can scale to account-level limits if triggered in a loop (by an S3 event that creates another S3 object, for instance). Batch jobs with broad permissions can spawn more resources than expected. Step Functions can execute recursive workflows without a natural stopping condition. Glue jobs triggered repeatedly by failed job bookmarks can accumulate hours of compute before anyone notices.

These aren't primarily security issues in the moment; they're billing issues. They appear in Cost Explorer as sudden spikes in a service that had been steady, which is usually the first indication something is misconfigured.

A woman in a boat surrounded by diagrams on paper that are blowing away one at a time

Configuration drift over time

Environments change in ways that don't get reflected in cost reviews. An instance type gets bumped during an incident and nobody reverts it. A developer adds a NAT Gateway during a debugging session and forgets to route traffic back through the existing one. Lambda functions start logging more verbose data after a debugging change that was never reverted. A new ALB gets provisioned for a temporary project and stays up.

None of these changes are catastrophic on their own, but they accumulate. An environment that was reasonably optimized a year ago may have drifted significantly, and the drift is invisible until someone actually looks.

Where to start

If you want a systematic picture of where your environment stands across these categories, a free cloud audit is a practical first step. We review cost patterns, resource utilization, configuration, and tagging and come back with a prioritized list of what to address first. Get in touch to get started.

Share this post

Know someone wrestling with their cloud? Send this their way and make their life easier.

Email

Don't miss this

You might also like

Keep exploring practical ways to simplify, secure, and de-stress your cloud.

Free Cloud Analysis for Cost, Security & Performance