MySQL : Frequent restarts – chasing the OOM Killer

Published on: July 20, 2014 by George K.

MySQL : Frequent restarts – chasing the OOM Killer

Scenario:

This was one of the rare issues I encountered. The MySQL server on one of the client’s server was getting restarted quite frequently at random intervals. The server logs were quite identical with a forced process termination and the server recovered itself in the majority of the cases. This is quite different from a crash/FS error where an auto recovery is not common. Let us see how to deal with this MySQL frequent restart issue.

Corresponding logs ( Eg: the latest restart) were read as given below

131113 14:20:50 mysqld_safe Number of processes running now: 0
131113 14:20:5 0mysqld_safe mysqld restarted
131113 14:20:50[Note] Plugin ‘FEDERATED’is disabled.
131113 14:20:50 InnoDB: Initializing buffer pool, size = 4.0G
131113 14:20:51 InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn 4142899941524
131113 14:20:51 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files…

The error log message “mysqld_safe Number of processes running now: 0” indicates scarcity for resources to pursue the operations.  
Availability of primary memory was neither appreciable nor alarming. The MySQL server was poorly optimized and hence I went ahead and tweaked the server for optimal resource usage. This reduced the restart frequency and the server reported an uptime of more than 24 hours. Which was a significant improvement  from the prior uptime values of 30-40 minutes.
During the monitoring phase, it is noted that MySqld showed high resource consumption in the form of high CPU usage. The usage had crossed 500% at times. Meanwhile, the MySQL processes were stacked up in the queue and showed a higher sleep time.
The slow log query file /var/lib/mysql/db-slow.log shows too many queries and indicates that some dbs as well as queries need to be optimized. 
A check to identify the resource crunch and service termination using Out Of memory (OOM) Killer was performed  in the syslog.
egrep -i "oom|kill|mysql" /var/log/messages
The output was quite surprising
Nov 13 12:52:49 db1 prm(303104): process 280704exceeded resource limits, killed.

Nov 13 12:52:50 db1 prm(303104): check /usr/local/prm/killed/280704forprocess specific information.

Nov 13 13:32:48 db1 prm(375464): process 314307exceeded resource limits, killed.

Nov 13 13:32:48 db1 prm(375464): check /usr/local/prm/killed/314307forprocess specific information.

Nov 13 14:20:49 db1 prm(450579): process 387718exceeded resource limits, killed.

Nov 13 14:20:49 db1 prm(450579): check /usr/local/prm/killed/387718forprocess specific information.
As you can see from the above log. The pid :387718 got killed by the Process Resource Monitor(PRM) at 14.20. I needed to confirm that the killed process is MySQL.  For the I confirmed the the MySQL server uptime
[~]# mysqladmin versionmysqladmin Ver 8.42Distrib 5.1.70, forunknown-linux-gnu on x86_64

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective owners.

Server version 5.1.70-cll

Protocol version 10

Connection Localhost via UNIX socket

UNIX socket /var/lib/mysql/mysql.sock

Uptime: 4hours 11min 32sec

Threads: 10 Questions: 6679219 Slow queries: 166 Opens: 380202 Flush tables: 1 Open tables: 2048 Queries per second avg: 442.566
 
then compare it with the  current time
[~]# date

Wed Nov 13 18:32:27PST 2013
It means that the MySQL server got restarted at  14:20-21 . This is perfectly in synchronization with the time given in the log file
Nov 13 14:20:49 db1 prm(450579): process 387718exceeded resource limits, killed.

Nov 13 14:20:49 db1 prm(450579): check /usr/local/prm/killed/387718forprocess specific information.
The only thing we needed to confirm was that the process got killed was “MySQL”.  For that check the prm log file as given below
# cat /usr/local/prm/killed/387718

USER: mysql

PID : 387718

CMD : /usr/sbin/mysqld

CPU%: 407 (limit: 85) MEM%: 10 (limit: 55)

PROCS: 1 (limit: 75)
This confirms that the process killed was “MySQL” and it was executed by PRM.The MySQL  service was terminated due to high CPU usage. The server uptime is perfectly in synchronization with the service termination time by PRM ie Nov 13 14:20:49 .  From the data, it is evident that PRM kills the processes which exceed the set threshold and creates a resource bottleneck for the server performance.
 

Category : Linux, MySQL, Troubleshooting

George K.

George started his career in web hosting and Linux technical support in the year 2004 and is with SupportSages since 2009. He has keen interest in server optimizations, custom security solutions, hacked server recovery, cyber forensic and high availability fail over system design and implementation. George loves long drives and is passionate about art and literature.

You may also read:

Comments

Add new commentSIGN IN

How to setup a WordPress website on a freshly provisioned VPS with ISPmanager control panel

Azure VmScaleset Alert Configuration

EC2 Status check and restart using SSM runbook

Blog