The Most Costly AWS Resources Companies Overlook

You’ll see the same suggestions everywhere you go when looking at cloud cost optimization guides, especially if you’re learning through a Cloud Computing Course rightsize your EC2 instances, use reserved instances, and turn off unused resources. That’s all fine. It’s also the very thing that most teams have already done or considered.
What’s missing in those guides, to my knowledge, is the stuff that’s running quietly in the background, inching up charges nobody’s watching. They’re not dramatic. They don’t fail in the middle of the night. They simply continue operating inside old accounts for months without any proper audit.
The Bill That Nobody Budgeted For:
1. NAT Gateways
It is this which surprises people more than anything else. The setup is so simple and rarely used that teams are unaware of how much they are paying until someone looks into it.
The problem isn’t only the hourly rate; it’s the data processing charge. AWS bills for data that goes through a NAT gateway in gigabytes. This quickly adds up if there’s a lot of internal traffic or microservices make many cross-service calls.
A company with a data pipeline that pushes a couple of terabytes a day can spend more on NAT Gateway data processing than on the EC2 instances that actually run the pipeline.
The larger issue: NAT Gateways tend to proliferate. A single developer spins it up in a new VPC for testing. That VPC remains in existence. Another environment is formed. Each comes with its own NAT Gateway. No one verifies that all of the traffic passing through them is truly required.
The problem isn’t that NAT Gateways should be removed; it’s that there should be a clear intent for traffic flow. VPC endpoints are used for all communication between AWS services; NAT Gateway should not be used. Most AWS services, including S3, DynamoDB, SQS, and others, support gateway/interface endpoints. That traffic does NOT have to exit the VPC.
Eventually, most teams set this up. The difference is that they aren’t reviewing the traffic that ran through NAT Gateways before the change.
Platforms like Costimizer gives you an Agentic dashboard, where you can easily solve this issue.
2. Elastic Block Storage: Low Cost and High Volume

EBS snapshots are a standard “accumulation” problem. They are generated automatically by backup policies, manually by engineers before changes, and by tools and services that trigger their generation. The deletion of these files is rarely done systematically.
Single shots are inexpensive. That’s an expensive number of snapshots, spread across dozens of volumes, hundreds of accounts and regions, some 2-3 years old. Plus, the price won’t increase, so it doesn’t set off an alarm. It just sits there.
The particular problem with snapshots is that they aren’t deleted in isolation; instead, subsequent snapshots are incremental and don’t consider dependencies. Many teams are hesitant to clean up old snapshots because they are not confident that they know which ones can be safely removed. So they walk away from them all. It is a natural hesitation, but it is expensive.
The bad part? If a volume is deleted but snapshots are not, the snapshots remain. There is no automatic cleaning. When the resource is depleted, the price is not.
Take snapshots of the audit for each age group and for existing source volumes. If older than the recovery window — if your team was not going to ever recover from a 14-month-old snapshot — then it is a candidate for deletion. This seems like a no-brainer, but very few teams do it on a regular basis.
This is truly a small per-unit cost. The cost of an unattached Elastic IP is approximately $0.005 per hour, or $3.60 per month. It’s no one’s business if one of them does.
The issue is that they build up. If an instance is terminated, it reserves an Elastic IP for it. One is taken by someone, ‘just in case. A deployment script allocates one, but it fails partway. These “things” are there.
In a large AWS organization with many accounts, it’s not uncommon to have 50, 100, 200, or more Elastic IPs that no one has thought about in months or even years. At that scale, it becomes real money, and more importantly, it often points to poor account hygiene across the cloud environment.
Elastic IPs can serve as a canary metric. If there are many unattached ones, chances are there are also unused snapshots, orphaned volumes, and forgotten load balancers across accounts. Resource governance and infrastructure visibility are increasingly important topics discussed in advanced cybersecurity course modules because unmanaged cloud assets can create both financial and security risks.
3. Change the default timeout to 10 minutes.
CloudWatch Logs storage costs $0.03 per GB per month. It is not too expensive that most teams are hesitant to do it. The problem is the volume.
By default, logs from Lambda functions, ECS tasks, API Gateway, and other services are retained as ‘Never Expire’ unless they are specifically set to a different retention period. In an active environment, that means years of logs building up for services you may never have had.
This is especially problematic with lambda. Each Lambda function has its own log group. In microservice architectures, where hundreds of functions are running, hundreds of log groups can be created, many of which are deprecated; those log groups can have indefinite retention periods and be created slowly and continuously.
The logs from a lambda function that ran two years ago are almost certainly not being used. Those logs of a two-year-old Lambda are almost definitely not being used. However, they still exist, and they still cost money.
Provision retention policy. Most teams will be able to find a point between 30 and 90 days when they have all the questions they want to ask. If it is truly necessary, it is not the place for CloudWatch at $0.03/GB/month for an infinite period; rather, it should be in S3 with a Lifecycle Policy.
Another interesting tidbit: If you use a query in CloudWatch Logs Insights, it is charged by GB scanned. Old retained logs increase the cost of each query executed on that log group, even if the data is fresh.
4. Instances of RDS in the wrong state.
One specific trap is stopping RDS instances. AWS does not charge for compute when you stop an RDS instance. It won’t prevent the storage fees, however. You may be surprised to discover that AWS restarts stopped RDS instances after 7 days, as documented.
The planned process is: a temporary halt, with a restart if necessary. The true scenario in many organizations: stop it because it’s costing too much money, forget about it, then find it running again a week later because AWS restarted it automatically, stop it, or just leave it running.
RDS instances that are repeatedly stopped and restarted automatically indicate that the team is using the ‘stop’ function as a cost-saving strategy when it was intended for short-term use. The actual solution for a database that’s not needed is always a snapshot-and-restore workflow or Aurora Serverless, depending on the frequency of use and the acceptable resume latency.
This is also similar to multi-AZ RDS instances that are not deployed in production. Multi-AZ is approximately twice as expensive as an instance. That redundancy is typically helpful for production. It’s usually not, unless three engineers use it while the business is open, which is rare for a dev/staging database.
5. Data Transfer Charges across the Availability Zones
This one is not well known because it is not a named resource—it is a behavior.
The cost of data transfers between regions and between Availability zones within a region is $0.01 per GB. In most cases, that’s a small number. For distributed systems that rely heavily on making cross-service calls or for those that were not designed with AZ locality in mind, it builds.
The common way: application instances are deployed to multiple AZs for redundancy — that’s right. But the traffic is sent by the load balancer/service mesh without knowing the caller’s AZ. When a request is made to an instance in us-east-1a, it is routed to a dependency in us-east-1b. The cross-AZ transfer will charge you.
That’s not chump change for a system that handles millions of requests per day. Total data transfer between AZs can easily reach hundreds of gigabytes per day, and at $0.01 per GB, it’s indeed a line item!
This is minimized by using AZ-aware routing, which routes traffic between the same AZ whenever possible. Most modern service-mesh and service-load-balancer configurations support it. However, it takes someone to set it up, and most teams don’t until they see the charge.
6. Unused or Over-Provisioned ElastiCache Clusters
ElastiCache is provisioned for peak load and maintained at peak performance to support future workloads. For a traffic spike six months ago, a rightsized cache cluster may be vastly oversized for current traffic. No one has ever come back to see.
ElastiCache doesn’t have a stop function like EC2 or RDS. The cluster runs, or it doesn’t exist. Teams that wish to save money on an ElastiCache cluster need to delete it and recreate it with a smaller size, which is a little more than rightsizing an EC2 instance. It’s likely this is due to the friction between these two areas, so it’s one of the lesser-discussed sections in cost reviews.
The other point is that multi-AZ replication is enabled where it is not warranted. A Redis cluster with replica nodes across AZs is a good option for production data that must be available across the region. In a staging environment, it’s typically twice the cost and offers no real value.
7. Why These Resources Are Just Not Used
There is a pattern in all of these items on the list. All of these are NOT workloads. They’re supporting infrastructure, that which surrounds applications instead of being applications. Nearly always, cost review is on the compute and databases, because those are the big numbers. These supporting materials are provided with a footnote, if at all mentioned.
The other common trait they share: they don’t have any obvious owners. To a team, an EC2 instance is tagged. A NAT Gateway or an old EBS snapshot is loosely attached to the account. No one is being paged or calling about it. Nobody’s checking.
To understand how rapidly cloud infrastructure and datasets are evolving, you can also explore this detailed industry update by the Boston Institute of Analytics: Big Data Gets Bigger: Major Dataset & API Launches You Should Know
The remedy is to conduct a scheduled audit (at least quarterly) that targets these categories. Not the whole bill. These are only the specific resource types shown, grouped by age, use, and whether a source resource remains. The first time, the audit will only take a few hours. But it was much less after that.
About 99 percent of organizations that do this discover something in the first hour, usually multiple somethings. The amount recovered is variable, but the more interesting result is often the tagging and ownership issues it reveals, which lead to larger structural solutions that prevent
