• DevOps
    Case Study

    How we built a resilient multi-account, multi-cloud solution for a Health Tech service provider!

    READ CASESTUDY
    icon

    24/7 DevOps as a Service

    Round-the-clock DevOps for uninterrupted efficiency.

    icon

    Infrastructure as a Code

    Crafting infrastructure with ingenious code.

    icon

    CI/CD Pipeline

    Automated CI/CD pipeline for seamless deployments.

    icon

    DevSecOps

    Integrated security in continuous DevOps practices.

    icon

    Hire DevOps Engineers

    Level up your team with DevOps visionaries.

    icon

    Consulting Services

    Navigate success with expert DevOps consulting.

  • TechOps
    Case Study

    How we built a scalable Odoo solution for a Travel Tech service provider!

    READ CASESTUDY

    WEB HOSTING SUPPORT

    icon

    HelpDesk Support

    Highly skilled 24/7 HelpDesk Support

    icon

    Product Support

    Boost your product support with our expertise.

    MANAGED SERVICES

    icon

    Server Management

    Don’t let server issues slow you down. Let us manage them for you.

    icon

    Server Monitoring

    Safeguard your server health with our comprehensive monitoring solutions.

    STAFF AUGMENTATION

    icon

    Hire an Admin

    Transform your business operations with our expert administrative support.

    icon

    Hire a Team

    Augment your workforce with highly skilled professionals from our diverse talent pool.

  • CloudOps
    Case Study

    How we helped a Private Deemed University in India, save US $3500/m on hosting charges!

    READ CASESTUDY
    icon

    AWS Well Architected Review

    Round-the-clock for uninterrupted efficiency

    icon

    Optimize

    Efficient CloudOps mastery for seamless cloud management

    icon

    Manage

    Automated CI/CD pipeline for seamless deployments

    icon

    Migrate

    Upgrade the journey, Migrate & Modernize seamlessly

    icon

    Modernize

    Simplify compliance complexities with our dedicated services

    icon

    FinOps as a Service

    FinOps as a Service

  • SecOps
    Case Study

    How we built a scalable Odoo solution for TravelTech service provider!

    READ CASESTUDY
    icon

    VAPT

    Vulnerability Assessment and Penetration Testing

    icon

    Source Code Review

    Ensuring source code security ans safe practices to reduce risks

    icon

    Security Consultation

    On demand services for improving server security

    icon

    System Hardening

    Reduced vulnerability and proactive protection

    icon

    Managed SoC

    Monitors and maintains system security. Quick response on incidents.

    icon

    Compliance as a Service

    Regulatory compliance, reduced risk

  • Insights
    Case Study

    How we helped a Private Deemed University in India, save US $3,500/m on hosting charges!

    READ CASESTUDY
    icon

    Blog

    Explore our latest articles and insights

    icon

    Case Studies

    Read about our client success stories

    icon

    Flipbook

    Explore our latest Flipbook

    icon

    Events

    Join us at upcoming events and conferences

    icon

    Webinars

    Watch our educational webinar series

  • Our Story
  • Contact Us

Interested to collaborate?

Get in touch with us!

Ready to elevate your business with certified cloud expertise? Contact us today to learn how our team can help you leverage cloud technology to drive growth, streamline operations, and enhance security.

  • AWSAWS
  • Azure CloudAzure Cloud
  • Google CloudGoogle Cloud
  • Akamai CloudAkamai Cloud
  • OVHOVH
  • Digital OceanDigital Ocean
  • HetznerHetzner
  • Kubernetes Consultancy Services
  • K8s & Cloud native Solutions
  • 24/7 Infrastructure Monitoring
  • DevOps as a Service
  • Cloud CI/CD Solutions
  • White Labeled MSP Support
  • Our story
  • Life@SupportSages
  • Insights
  • Careers
  • Events
  • Contact Us

Connect with us!


LinkedInFacebookXInstagramYouTube

aws partneraws advanced partner
SupportSages

Copyright © 2008 – 2026 SupportSages Pvt Ltd. All Rights Reserved.
Privacy PolicyLegal TermsData ProtectionCookie Policy

EC2 Instance Management (Start and Stop) Based on Memory Utilization Exceeding Threshold Value”

Arya P B

  • 5 min read
EC2 Instance Management (Start and Stop) Based on Memory Utilization Exceeding Threshold Value”

Generating audio, please wait...

In cloud environments, managing EC2 instances effectively is essential to ensure system reliability and performance. One of the key aspects of EC2 instance management is monitoring resource usage, particularly memory utilization. If memory utilization exceeds a threshold, it may be necessary to take corrective actions, such as stopping and restarting the instance. This article demonstrates how to automate the process of monitoring EC2 memory usage, checking for connectivity, and managing instances using AWS Lambda and Step Functions.

Overview of the Solution

The solution automates the following workflow:

  1. Monitor EC2 Instances: Periodically check the memory utilization of EC2 instances.
  2. Check Memory Utilization: If the memory usage exceeds a defined threshold, further actions are taken.
  3. SSM Connection Check: If the instance is reachable via AWS Systems Manager (SSM), no action is taken. If not, the instance is stopped and restarted.
  4. Use AWS Step Functions: The entire process is orchestrated using AWS Step Functions, allowing the automation of the flow from memory checking to stopping/restarting EC2 instances.

Architecture

The solution involves the following AWS services:

  • AWS Lambda: Used for writing custom code to monitor EC2 instances, check memory utilization, manage instance states (stop/start), and interact with AWS Systems Manager.
  • AWS Step Functions: Orchestrates the entire flow, ensuring that the actions are taken based on the logic and state transitions.
  • AWS EC2: The core compute resource being monitored and managed.
  • AWS CloudWatch: Used for monitoring the metrics related to EC2 instances, such as memory and CPU utilization.
  • AWS Systems Manager (SSM): Used to check if the instance is accessible remotely before performing any shutdown or restart.

Lambda Function to Monitor EC2 Instances

The Lambda function is responsible for:

  1. Checking Memory Utilization: It checks the memory utilization of an EC2 instance.
  2. Stopping or Starting Instances: If the memory utilization exceeds a set threshold, it stops the instance. If the instance is unreachable, it attempts to restart it after waiting for a set period.
  3. Handling EC2 Instances Based on Action: Based on the memory usage, it triggers actions to stop or start instances using Step Functions.
import boto3
import time
from datetime import datetime, timedelta

ec2_client = boto3.client('ec2')
ssm_client = boto3.client('ssm')
cloudwatch = boto3.client('cloudwatch')
stepfunctions_client = boto3.client('stepfunctions')

MEMORY_THRESHOLD = 4
WAIT_TIME = 300
STEP_FUNCTION_ARN = 'arn'

def lambda_handler(event, context):
print("Received event:", event)
action = event.get('action')

if action:
instance_id = event.get('instance_id')
if not instance_id:
print("Instance ID is required for specific actions.")
return {'error': "Instance ID is required for this action."}

if action == 'check_status':
instance_status = check_instance_status(instance_id)
trigger_step_function(instance_id, instance_status)
return {'status': instance_status, 'instance_id': instance_id}

elif action == 'wait5min':
memory_utilization = get_memory_utilization(instance_id)

if memory_utilization > MEMORY_THRESHOLD:
if can_connect_via_ssm(instance_id):
print(f"Connection via SSM successful for instance {instance_id}, no reboot needed.")
return {'message': 'Instance reachable via SSM, no reboot needed'}

ec2_client.stop_instances(InstanceIds=[instance_id])
print(f"Memory utilization remains high, and instance {instance_id} is unreachable, stopping instance.")
action = 'stop' 
trigger_step_function(instance_id, action)
return {'action': 'stop', 'instance_id': instance_id}
else:
print(f"Memory utilization has decreased to {memory_utilization}% for instance {instance_id}")
return {'message': 'Memory utilization below threshold, no stop action taken'}

elif action == 'start':
ec2_client.start_instances(InstanceIds=[instance_id])
return {'action': 'start', 'instance_id': instance_id}

else:
instances = get_all_instances()
if not instances:
print("No running EC2 instances found.")
return {'message': "No action required"}

for instance_id in instances:
print(f"Checking instance: {instance_id}")
memory_utilization = get_memory_utilization(instance_id)

if memory_utilization < MEMORY_THRESHOLD:
print(f"Memory utilization is below threshold ({memory_utilization}%) for instance {instance_id}")
else:
print(f"Memory utilization is high ({memory_utilization}%) for instance {instance_id}, waiting for {WAIT_TIME} seconds...")
action = 'wait5min'
trigger_step_function(instance_id, action)

def get_all_instances():
try:
response = ec2_client.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
instance_ids = [ instance['InstanceId']
for reservation in response['Reservations']
for instance in reservation['Instances']
]
return instance_ids
except Exception as e:
print(f"Error retrieving instances: {e}")
return []

def get_memory_utilization(instance_id):
# Fetch memory utilization metric (CloudWatch)
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='MemoryUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.utcnow() - timedelta(minutes=10),
EndTime=datetime.utcnow(),
Period=60,
Statistics=['Average']
)
if response['Datapoints']:
return response['Datapoints'][-1]['Average']
else:
print(f"No memory utilization data found for instance {instance_id}.")
return 0

def can_connect_via_ssm(instance_id):
try:
response = ssm_client.send_command(
InstanceIds=[instance_id],
DocumentName="AWS-RunShellScript",
Parameters={'commands': ["echo 'test connection'"]}
)
command_id = response['Command']['CommandId']
time.sleep(5)
command_status = ssm_client.get_command_invocation(
CommandId=command_id,
InstanceId=instance_id
)
return command_status['Status'] == 'Success'
except Exception as e:
print(f"Error checking SSM connection for instance {instance_id}: {e}")
return False

def check_instance_status(instance_id):
try:
response = ec2_client.describe_instance_status(
InstanceIds=[instance_id],
IncludeAllInstances=True
)
instance_status = response['InstanceStatuses'][0]['InstanceState']['Name']
return instance_status
except Exception as e:
print(f"Error checking instance status for {instance_id}: {e}")
return "Unknown"

def trigger_step_function(instance_id, action):
try:
response = stepfunctions_client.start_execution(
stateMachineArn=STEP_FUNCTION_ARN,
input=f'{{"instance_id": "{instance_id}", "action": "{action}"}}'
)
print("Step Function started successfully:", response['executionArn'])
except Exception as e:
print("Error starting Step Function:", e);

AWS Step Function for Orchestrating Actions

{
"Comment": "EC2 instance memory and status check workflow",
"StartAt": "Choice",
"States": {
"Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.action",
"StringEquals": "stop",
"Next": "Wait"
},
{
"Variable": "$.action",
"StringEquals": "wait5min",
"Next": "Wait 5 minutes"
}
],
"Default": "End"
},
"Wait 5 minutes": {
"Type": "Wait",
"Seconds": 300,
"Next": "Lambda Invoke"
},
"Lambda Invoke": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:", 
"Payload.$": "$"
},
"End": true
},
"Wait": {
"Type": "Wait",
"Seconds": 300,
"Next": "checkInstanceStatus"
},
"checkInstanceStatus": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:",
"Payload": {
"action": "check_status",
"instance_id.$": "$.instance_id"
}
},
"Next": "checkstate"
},
"checkstate": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.status",
"StringEquals": "stopping",
"Next": "Wait"
},
{
"Variable": "$.status",
"StringEquals": "stopped",
"Next": "startInstance"
}
],
"Default": "End"
},
"startInstance": {
"Type": "Task",
"Resource": "arn:aws:lambda:",
"Parameters": {
"action": "start",
"instance_id.$": "$.instance_id"
},
"End": true
},
"End": {
"Type": "Succeed"
}
}
}

Conclusion

By integrating AWS Lambda and Step Functions, we can automate the management of EC2 instances based on memory utilization. This solution ensures that high memory usage is detected, the instance status is checked, and necessary actions like stopping or starting the EC2 instance are triggered automatically. With this approach, you can save time and reduce manual intervention in managing your EC2 instances, ensuring that your cloud environment is always running smoothly.

  • AWS
  • DevOps

Continue Your Journey With…

DevOps as a Service

DevOps as a Service

Let us do the heavy lifting for you

Looking for AWS Experts?

We provide top-of-the-line custom AWS setup services tailored to your needs.

EC2 Instance Management (Start and Stop) Based on Memory Utilization Exceeding Threshold Value”

Analyzing AWS IAM Users: Access Key and Password Age

Analyzing AWS IAM Users: Access Key and Password Age
  • DevOps
logo

Analyzing AWS IAM Users: Access Key and Password Age

Analyzing AWS IAM Users: Access Key and Password Age
  • AWS
  • DevOps
logo

Auto-Restart EC2 Instances on Status Check Failure: Quick Setup Guide

Auto-Restart EC2 Instances on Status Check Failure: Quick Setup Guide
  • DevOps
logo

Auto-Restart EC2 Instances on Status Check Failure: Quick Setup Guide

Auto-Restart EC2 Instances on Status Check Failure: Quick Setup Guide
  • AWS
  • DevOps
logo

Posts by Arya P B

Athena