Cloud cost optimization is no longer just a finance topic—it is a core engineering responsibility. Teams often deploy workloads quickly, but over time environments become oversized, underutilized, or operationally expensive. At the same time, security and governance controls are frequently overlooked in the rush to ship features.
This blog expands on the AWS Architect’s Map / Decision & Governance reference and is designed to be linked from your notebook as a troubleshooting and optimization guide. It gives users practical decision frameworks, security baselines, and actionable cost-saving checks they can use immediately.
Why This Matters
Many organizations experience one or more of the following:
- EC2 instances running at low CPU but high monthly cost
- Unused EBS volumes accumulating charges
- Logs retained forever with no compliance need
- Container platforms over-engineered for simple apps
- Legacy monoliths migrated without modernization
- IAM access keys used in CI/CD pipelines
- No governance around architecture choices
These issues increase cost, complexity, and risk.
The solution is right-sizing architecture decisions and implementing lightweight governance.
Part 1: The Right-Sizing Decision Matrix
Choosing the wrong AWS service often causes both technical debt and unnecessary spend.
Below is a practical decision matrix to help users select the correct compute platform.
| Project Scenario | Recommended AWS Service | Why This Choice Works |
|---|---|---|
| Event-driven jobs / micro-tasks | AWS Lambda | Pay only when code runs. Auto-scales to zero. |
| Standard containerized apps | AWS ECS (Fargate) | No EC2 management. Simpler operations. |
| Complex Kubernetes workloads | AWS EKS | Full Kubernetes control, scalability, portability. |
| Legacy monolith / custom OS workloads | Amazon EC2 | Root-level control and maximum compatibility. |
Part 2: When to Use Each Service
1. AWS Lambda
Use Lambda when workloads are:
- API backends with intermittent traffic
- Scheduled jobs
- File processing pipelines
- Event-driven automation
- Lightweight microservices
Common Troubleshooting Cases
Problem: High Lambda Cost
Check:
- Excessive invocation count
- Overallocated memory
- Long execution duration
- Retry storms
Fix
- Optimize code execution time
- Reduce memory if not needed
- Add dead-letter queues
- Use provisioned concurrency only when required
2. AWS ECS with Fargate
Ideal for:
- Standard Dockerized web apps
- APIs
- Background workers
- Teams wanting containers without Kubernetes overhead
Common Troubleshooting Cases
Problem: Fargate Costs Too High
Check:
- Overprovisioned CPU/memory
- Always-on services with low traffic
- Idle staging environments
Fix
- Auto-scale tasks
- Use scheduled shutdown for non-prod
- Optimize task sizes
3. AWS EKS
Use when you need:
- Kubernetes-native tooling
- Multi-team platform standardization
- Advanced Auto-Scaling
- Better resource allocation
Common Troubleshooting Cases
Problem: Cluster Too Expensive
Check:
- Too many worker nodes
- Idle namespaces
- Oversized requests/limits
- Multiple underused clusters
Fix
- Consolidate clusters
- Use Cluster Autoscaler / Karpenter
- Tune pod requests
4. Amazon EC2
Still the right choice for:
- Legacy applications
- Windows workloads
- Specialized software requiring OS access
- High-performance custom tuning
Common Troubleshooting Cases
Problem: EC2 Bill Too High
Check:
- Low utilization instances
- Wrong instance family
- Unattached EBS volumes
- On-demand instances running 24/7
Fix
- Resize instances
- Use Savings Plans
- Move to Graviton
- Remove idle resources
Part 3: Security Non-Negotiables (2026 Checklist)
Every environment should implement these baseline controls.
1. IMDSv2 Only on EC2
Disable IMDSv1 to reduce SSRF attack risks.
Why It Matters
Attackers often exploit metadata endpoints to steal temporary credentials.
Command
aws ec2 modify-instance-metadata-options \
--instance-id <instance_id> \
--http-tokens required \
--http-endpoint enabled2. OIDC for CI/CD
Avoid IAM access keys in GitHub, GitLab, Jenkins, or pipelines.
Use OpenID Connect to obtain temporary credentials.
Why It Matters
Static keys are frequently leaked.
3. KMS Encryption by Default
Encrypt:
- S3 buckets
- RDS databases
- EBS volumes
- Backups
Why It Matters
Encryption is often required for compliance and risk reduction.
4. Enable GuardDuty
GuardDuty helps detect:
- Crypto-mining activity
- Suspicious API calls
- Credential misuse
Part 4: The Cost Killer Toolkit
These checks often reveal immediate waste
1. Find Unattached EBS Volumes
aws ec2 describe-volumes --filters Name=status,Values=availableWhy It Matters
Detached volumes continue billing.
2. Find CloudWatch Logs With No Retention Policy
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==null].[logGroupName]' \
--output textWhy It Matters
Infinite retention = silent long-term storage cost.
3. Look for Graviton Opportunities
ARM-based Graviton instances often save 20–40% depending on workload compatibility.
Part 5: Quarterly Governance Review Checklist
Run this every quarter.
Compute
- Any idle EC2 instances?
- Any oversized RDS classes?
- Any Lambda functions with poor efficiency?
Storage
- Old snapshots?
- Unused EBS volumes?
- S3 lifecycle policies missing?
Networking
- Idle NAT gateways?
- Unused Elastic IPs?
Security
- Old IAM users?
- Access keys older than 90 days?
- GuardDuty disabled anywhere?
Commitment Savings
- Savings Plans coverage optimized?
- RI utilization healthy?
Part 6: Troubleshooting by Symptom
“My AWS bill suddenly spiked”
Check:
- Cost Explorer by service
- New EC2 instances
- Data transfer charges
- CloudWatch log growth
- Lambda invocation spikes
“My Kubernetes cluster is too expensive”
Check:
- Node idle percentage
- Over-requested CPU/memory
- Duplicate environments
- Too many load balancers
“Our CI/CD pipeline is insecure”
Check:
- Long-lived IAM keys
- Missing OIDC federation
- Overprivileged roles
Final Takeaway
Do not build a ship when you only need a bicycle.
Use the simplest AWS service that solves the problem, apply mandatory security controls, and continuously remove waste. The combination of right architecture + governance creates the biggest long-term savings.








