Why Your AWS Bill Is Still High - Even After Checking the Obvious
It’s a familiar moment for many AWS users:
You open the bill expecting EC2, RDS, EKS, or Bedrock to be the culprits…
…but the numbers don’t add up.
Even after accounting for your major services, your AWS costs remain surprisingly high.
This is more common than you think.
Modern AWS environments hide dozens of small, silent cost drivers that add up month over month. The problem isn’t just compute or databases - it’s the surrounding services, configuration drift, and “always-on” defaults that quietly inflate spend.
Below, we break down the most frequent hidden or overlooked AWS cost traps we see in real customer environments, and how you can identify (and eliminate) them.
1. The Rise of “Shadow Storage”
Most teams look at S3 storage totals - but overlook the hidden multipliers:
a. Old S3 multipart uploads
Abandoned multipart uploads can accumulate hundreds of GB over time.
b. S3 versioning gone wild
Versioning creates unlimited copies of objects. A 1 GB log file overwritten daily can quietly become a 30 GB monthly surprise.
c. Cross-region replication
Two buckets, doubled cost.
Three buckets, tripled.
d. S3 Glacier retrieval spikes
Backup software or misconfigured policies can trigger expensive retrievals you never intended.
2. Data Transfer - The Silent Budget Killer
Data transfer is the #1 “mystery cost” for most AWS users.
Common offenders:
a. Inter-AZ traffic (not free!)
Many architectures replicate traffic across AZs unexpectedly - especially:
- load balancer → EC2
- EC2 → RDS
- ECS tasks → other tasks
- EKS cross-AZ pod communication
We’ve seen inter-AZ fees exceed EC2 instance cost in some workloads.
b. NAT Gateway tax
At $0.045/GB, large egress through a NAT Gateway becomes a silent cost avalanche.
c. CloudFront → origin fetches
Poor cache TTL settings or dynamic content can cause thousands of unnecessary origin pulls.
d. VPC endpoints (charged per hour + per request)
Some users believe they save money with endpoints - yet certain patterns end up costing more than simply using NAT.
3. Logs, Metrics, and “Invisible” Observability Costs
After compute, the biggest hidden cost category is almost always observability.
a. CloudWatch Logs retention
Most defaults never expire logs — leading to multi-TB log buckets.
b. CloudWatch custom metrics
Each custom metric costs money every single month.
High-cardinality labels explode pricing.
c. CloudWatch Logs “payload tax”
JSON logs from chatty microservices generate enormous ingestion volumes.
d. X-Ray sampling or tracing defaults
A single high-traffic service can generate huge amounts of trace data.
e. Third-party log shippers
If you push logs to multiple tools (Datadog, Splunk, Elastic), you’re often paying 2–3 times for the same data.
4. Old Snapshots, AMIs, and Other Orphaned Resources
AWS never deletes things for you unless you explicitly tell it to.
Common forgotten artifacts include:
- EBS snapshots (especially automated ones)
- AMIs created during deployments
- Lambda layer versions
- Old ECR images
- Elastic IPs not attached to anything
- DynamoDB tables used for past test environments
- Redshift clusters paused but not deleted
Snapshots, in particular, often accumulate years of incremental backups.
5. Overprovisioned “Support Services” Nobody Monitors
Even when EC2 and RDS look fine, the supporting services may not.
a. Overprovisioned load balancers
ALBs cost ~$20–30/month plus LCU charges.
Many companies have dozens they forgot about.
b. Queues with high throughput settings
SQS and SNS costs scale with request volume - sometimes artificially inflated by retry storms.
c. Step Functions charged by state transition
A poorly designed workflow can cost 10–50× more than expected.
d. API Gateway
High request volume, poor caching, or VPC integrations can dramatically increase cost.
e. DynamoDB
Provisioned throughput often remains high long after traffic decreases.
6. Orphaned Kubernetes and ECS Infrastructure
Even if compute workloads are accounted for, the ecosystem around them may not be:
a. EKS cluster control plane costs
Even the smallest EKS clusters incur a base charge.
b. Unused node groups or Fargate profiles
Clusters often keep scaling groups alive even after workloads change.
c. Persistent volumes never cleaned up
PVCs and EBS volumes survive long after pods are gone.
d. Service mesh proxies
High data volume through mesh sidecars increases compute and bandwidth costs.
7. IAM Misconfiguration Leading to “Runaway Services”
Misconfigured IAM roles sometimes allow:
- Lambda functions to scale without limits
- Batch jobs to spin up unlimited resources
- Step Functions to run recursive workflows
- Glue jobs to fire repeatedly due to failed triggers
All of these quietly run up the bill until someone notices.
8. Legacy Services Nobody Remembers
This is surprisingly common.
Old environments often contain:
- EMR clusters used once
- ElastiCache clusters created for tests
- OpenSearch domains left idle
- SageMaker notebooks left running
- Missing lifecycle rules for Athena/Glue results
- ML inference endpoints that never turn off
One or two forgotten services won’t break the bank — but dozens will.
9. Configuration Drift
Environments change over time:
- Someone adds a new ALB
- A developer bumps an instance type
- A team disables logging retention
- A Lambda function begins logging 10× more data
- A NAT Gateway becomes the default path for everything
Most organizations don’t detect drift until costs spike.
10. Tools, Integrations, and Pipelines Working Against You
Third-party tools often generate AWS cost indirectly:
- CI pipelines constantly pushing artifacts
- Monitoring agents with high sampling
- Replicating logs to multiple regions
- Serverless monitoring platforms adding overhead
- Backup tools creating redundant snapshots
Individually small - together expensive.
Final Thoughts: The Small Costs Add Up
If you’ve already accounted for EC2, RDS, Bedrock, and other top-level services, the remaining spend is likely hidden across:
- Storage multipliers
- Data transfer
- Logging and observability
- Orphaned or legacy resources
- Over-provisioned supporting services
- Kubernetes ecosystem sprawl
- Configuration drift
- Forgotten pipelines and integrations
The key is not just identifying one issue - but systematically reviewing these categories.