Why Your AWS Bill Is Still High - Even After Checking the Obvious

FinOps

It’s a familiar moment for many AWS users:
You open the bill expecting EC2, RDS, EKS, or Bedrock to be the culprits…
…but the numbers don’t add up.

Even after accounting for your major services, your AWS costs remain surprisingly high.

This is more common than you think.

Modern AWS environments hide dozens of small, silent cost drivers that add up month over month. The problem isn’t just compute or databases - it’s the surrounding services, configuration drift, and “always-on” defaults that quietly inflate spend.

Below, we break down the most frequent hidden or overlooked AWS cost traps we see in real customer environments, and how you can identify (and eliminate) them.

1. The Rise of “Shadow Storage”

Most teams look at S3 storage totals - but overlook the hidden multipliers:

a. Old S3 multipart uploads

Abandoned multipart uploads can accumulate hundreds of GB over time.

b. S3 versioning gone wild

Versioning creates unlimited copies of objects. A 1 GB log file overwritten daily can quietly become a 30 GB monthly surprise.

c. Cross-region replication

Two buckets, doubled cost.
Three buckets, tripled.

d. S3 Glacier retrieval spikes

Backup software or misconfigured policies can trigger expensive retrievals you never intended.

2. Data Transfer - The Silent Budget Killer

Data transfer is the #1 “mystery cost” for most AWS users.

Common offenders:

a. Inter-AZ traffic (not free!)

Many architectures replicate traffic across AZs unexpectedly - especially:

load balancer → EC2
EC2 → RDS
ECS tasks → other tasks
EKS cross-AZ pod communication

We’ve seen inter-AZ fees exceed EC2 instance cost in some workloads.

b. NAT Gateway tax

At $0.045/GB, large egress through a NAT Gateway becomes a silent cost avalanche.

c. CloudFront → origin fetches

Poor cache TTL settings or dynamic content can cause thousands of unnecessary origin pulls.

d. VPC endpoints (charged per hour + per request)

Some users believe they save money with endpoints - yet certain patterns end up costing more than simply using NAT.

3. Logs, Metrics, and “Invisible” Observability Costs

After compute, the biggest hidden cost category is almost always observability.

a. CloudWatch Logs retention

Most defaults never expire logs — leading to multi-TB log buckets.

b. CloudWatch custom metrics

Each custom metric costs money every single month.
High-cardinality labels explode pricing.

c. CloudWatch Logs “payload tax”

JSON logs from chatty microservices generate enormous ingestion volumes.

d. X-Ray sampling or tracing defaults

A single high-traffic service can generate huge amounts of trace data.

e. Third-party log shippers

If you push logs to multiple tools (Datadog, Splunk, Elastic), you’re often paying 2–3 times for the same data.

4. Old Snapshots, AMIs, and Other Orphaned Resources

AWS never deletes things for you unless you explicitly tell it to.

Common forgotten artifacts include:

EBS snapshots (especially automated ones)
AMIs created during deployments
Lambda layer versions
Old ECR images
Elastic IPs not attached to anything
DynamoDB tables used for past test environments
Redshift clusters paused but not deleted

Snapshots, in particular, often accumulate years of incremental backups.

5. Overprovisioned “Support Services” Nobody Monitors

Even when EC2 and RDS look fine, the supporting services may not.

a. Overprovisioned load balancers

ALBs cost ~$20–30/month plus LCU charges.
Many companies have dozens they forgot about.

b. Queues with high throughput settings

SQS and SNS costs scale with request volume - sometimes artificially inflated by retry storms.

c. Step Functions charged by state transition

A poorly designed workflow can cost 10–50× more than expected.

d. API Gateway

High request volume, poor caching, or VPC integrations can dramatically increase cost.

e. DynamoDB

Provisioned throughput often remains high long after traffic decreases.

6. Orphaned Kubernetes and ECS Infrastructure

Even if compute workloads are accounted for, the ecosystem around them may not be:

a. EKS cluster control plane costs

Even the smallest EKS clusters incur a base charge.

b. Unused node groups or Fargate profiles

Clusters often keep scaling groups alive even after workloads change.

c. Persistent volumes never cleaned up

PVCs and EBS volumes survive long after pods are gone.

d. Service mesh proxies

High data volume through mesh sidecars increases compute and bandwidth costs.

7. IAM Misconfiguration Leading to “Runaway Services”

Misconfigured IAM roles sometimes allow:

Lambda functions to scale without limits
Batch jobs to spin up unlimited resources
Step Functions to run recursive workflows
Glue jobs to fire repeatedly due to failed triggers

All of these quietly run up the bill until someone notices.

8. Legacy Services Nobody Remembers

This is surprisingly common.

Old environments often contain:

EMR clusters used once
ElastiCache clusters created for tests
OpenSearch domains left idle
SageMaker notebooks left running
Missing lifecycle rules for Athena/Glue results
ML inference endpoints that never turn off

One or two forgotten services won’t break the bank — but dozens will.

9. Configuration Drift

Environments change over time:

Someone adds a new ALB
A developer bumps an instance type
A team disables logging retention
A Lambda function begins logging 10× more data
A NAT Gateway becomes the default path for everything

Most organizations don’t detect drift until costs spike.

10. Tools, Integrations, and Pipelines Working Against You

Third-party tools often generate AWS cost indirectly:

CI pipelines constantly pushing artifacts
Monitoring agents with high sampling
Replicating logs to multiple regions
Serverless monitoring platforms adding overhead
Backup tools creating redundant snapshots

Individually small - together expensive.

Final Thoughts: The Small Costs Add Up

If you’ve already accounted for EC2, RDS, Bedrock, and other top-level services, the remaining spend is likely hidden across:

Storage multipliers
Data transfer
Logging and observability
Orphaned or legacy resources
Over-provisioned supporting services
Kubernetes ecosystem sprawl
Configuration drift
Forgotten pipelines and integrations

The key is not just identifying one issue - but systematically reviewing these categories.

Share this post

Know someone wrestling with their cloud? Send this their way and make their life easier.

Don't miss this

You might also like

Keep exploring practical ways to simplify, secure, and de-stress your cloud.

Essential DevOps Principles for Efficient Cloud Operations in 2025