How to enable RAID Monitoring in Nagios

Tags: How to Monitoring Nagios NRPE RAID

Published on: December 22, 2016 by Siju Jacob

How to enable RAID Monitoring in Nagios

Scenario:

During a Nagios monitoring implementation, we often need to depend NRPE plugins and custom commands to execute server monitoring tasks such as load monitoring, disk usage monitoring etc. on remote servers. While majority of the disk checks can be performed through simple tweaking of the existing commands, Raid disk health evaluation demands some advanced level of operations due to the architecture and raid controller differences with each RAID setup.

This article is to highlight the steps to be followed to add raid check for servers using the `MegaCli` utility. I assume that you already configured a Nagios server for server monitoring using NRPE plugin and are familiar with its working. Here we are focusing our discussion only on the configuration of RAID check.

Before delving into how to add the check, lets first look at what MegaCli is. MegaCLI is a command line interface (CLI) binary used to communicate with the full LSI family of raid controllers.

For a complete reference either call MegaCli -h or refer to the manual at: http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/3rd-party/lsi/mrsas/userguide/LSI_MR_SAS_SW_UG.pdf

Now let us move to the step by step instructions to enable RAID Monitoring on Nagios.

Step 1:

Step 2:

Before moving forward, verify the path to MegaCli. You can do that by issuing the command

 root@server:~# which MegaCli
 /sbin/MegaCli

As you already knew, the binary paths can vary according to the installations. If for some reason the path to the binary is different like /usr/sbin/MegaCli etc, then modify the script and commands below by replacing all instances of /sbin/MegaCli with the correct path to binary.

The below instructions are to be read only if megacli is not found. Otherwise, skip to Step 3

For Centos Machines, you may get an error like below.

 
[root@server ~]#MegaCli

MegaCli: command not found

[root@server ~]#which MegaCli

/usr/bin/which: no MegaCli in (/usr/local/ /sbin/usr/local/bin:/usr/sbin:/usr/bin:/opt/cpanel/composer/bin:/root/bin")

This doesn’t necessarily mean that MegaCli is absent. The path and name to access the utility might be different. In CentOs machines, the binary is installed at /opt/MegaRAID/MegaCli/MegaCli64

Try executing the below command

 # /opt/MegaRAID/MegaCli/MegaCli64 -v

MegaCLI SAS RAID Management Tool Ver 8.07.14 Dec 16, 2013 (c)Copyright 2013, LSI Corporation, All Rights Reserved. Exit Code: 0x00

If you see the output as above, it means the binary is present. The reason the command does not show up without full path is because the path to the binary is not included in the users PATH variable.

PATH is an environmental variable in Linux and other Unix-like operating systems that tells the shell which directories to search for executable files (i.e., ready-to-run programs) in response to commands issued by a user.

If this is the case, do the step below.

For easy access, lets create an alias for the command with name megacli and add it to .bashrc to make the change permanent.

Execute the commands below.


echo alias MegaCli=\"/opt/MegaRAID/MegaCli/MegaCli64\" >> /root/.bashrc

source /root/.bashrc

The bash built-in command “source” executes the content of the file /root/.bashrc and loads the variables to the current shell. So you can continue with your current session.
Now verify the binary


#MegaCli -v

If you see the version details, proceed to the next step

Step 3:

Create a new file check_raid at /usr/local/nagios/libexec Add the following code to the file check_raid

#!/bin/bash
if /sbin/MegaCli -PDList -aAll | grep -i failed &> /dev/null
    then
    EXIT=2
    STATUS="CRITICAL: RAID failure detected!"
elif ! /sbin/MegaCli -PDList -aAll | grep "Count: " | grep -v ": 0" &> /dev/null
    then
    EXIT=0
    STATUS="OK: RAID looks running fine"
else
    EXIT=1
    STATUS="WARNING: RAID errors detected!"
fi
echo "$STATUS"    
exit $EXIT

Do change the binary location in accordance with your installation and OS. For eg. in case of a CentOS server, replace /sbin/MegaCli as /opt/MegaRAID/MegaCli/MegaCli64 in the above script as it is the correct path to the Binary in Centos distributions.

Give the script execute permission by issuing


chmod +x /usr/local/nagios/libexec/check_raid

Step 4:

Now we have to assign a command for this task to /usr/local/nagios/etc/nrpe.cfg

To do this, add the following line to the end of file /usr/local/nagios/etc/nrpe.cfg

command[check_raid]=/usr/local/nagios/libexec/check_raid

If you are not comfortable with direct editing of configuration files, you can perform it using the following commands


echo 'command[check_raid]=/usr/local/nagios/libexec/check_raid' >> /usr/local/nagios/etc/nrpe.cfg

This is because, when we communicate from the nagios server, we will be calling up this command from the server which we are monitoring. While this happens, the client server executes the associated command and returns the output.

Step 5:

Now test if the script is running correctly by the following command.

root@server:~# /usr/local/nagios/libexec/check_raid

OK: RAID looks running fine

Step 6:

Now open the file /etc/sudoers and add the following lines to the bottom of the file:

a) If the system is running Debian


nagios ALL=NOPASSWD:/sbin/MegaCli

nagios ALL=NOPASSWD:/bin/bash

Editing the configurations files are always a risky shot. So the best way for this operation using the editor visudo .

Similarly you can execute the below command to get the same result as well


echo -e 'nagios ALL=NOPASSWD:/sbin/MegaCli\nnagios ALL=NOPASSWD:/bin/bash' >> /etc/sudoers

b) If it is a CentOS server, add the following code


nagios ALL=NOPASSWD:/opt/MegaRAID/MegaCli/MegaCli64

nagios ALL=NOPASSWD:/bin/bash

or can use the following command


echo -e 'nagios ALL=NOPASSWD:/opt/MegaRAID/MegaCli/MegaCli64\nnagios ALL=NOPASSWD:/bin/bash' >> /etc/sudoers

Also, if a line ‘Defaults requiretty‘ is present in /etc/sudoers, you must comment out the “Defaults requiretty” line as follows:


# Defaults requiretty

EasyWay: execute the below command


sed -ri 's/^Defaults requiretty/#Defaults requiretty/g' >> /etc/sudoers

As you know, nrpe checks the commands as user nagios. The check we did above returned the output as RAID OK because the command was executed as root user.

When we check at the client server, the query returns output but when checked from monitoring server, it will return error like ‘NRPE: Unable to read output‘. This is because we overlooked what user the command is executed as and if they have privilege to issue the command. The above lines allow the user nagios access to the commands /bin/bash and MegaCli. This is required because the nagios user is created with shell /sbin/nologin and MegaCli by default is a command which only root user has access to.

Step 7:

At this point, we have created a script to check Raid Status, we have configured a command in nrpe referencing it and have allowed the permissions required for the user nagios to execute the script. Now restart nrpe issuing the following command.

root@server:~# /etc/init.d/nrpe restart
Restarting nagios remote plugin daemon: nrpe.

Step 8:

Now login to monitoring server and issue the following command to check if its working

[root@monitor ~]#  /usr/local/nagios/libexec/check_nrpe -H aaa.bbb.ccc.ddd -c check_raid
OK: RAID looks running fine
[root@monitor ~]#

Be sure to replace the IP aaa.bbb.ccc.ddd with the client IP.

Step 9:

If the results are fine, then move ahead and add the check to the configuration file of the script. In our servers, locate the cfg file of the server under /usr/local/nagios/etc/objects/clients/ and add the following entries.

define service{
        use                     fiveminutes
        host_name               *enter server hostname here*
        service_description     Raid_Check
        contact_groups          *enter contact group here*
        check_command        check_nrpe!check_raid
         }

Be sure to replace the hostname and contact group if you are pasting the above snippet. You can also open the .cfg file of the client server and copy one of the service checks once again and just modify the service_description and check_command as above. The rest of the fields will be the same for all service checks within a cfg file.

Step 10

Now restart nagios server for the changes made to reflect.


[root@monitor ~]# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@monitor ~]#

Now logon to Nagios Web Interface and verify that the check is reflecting correctly there 🙂

Category : Linux

Siju Jacob

Web enthusiast and tech savvy, curious to solve and find solutions to seemingly difficult tasks. Currently working as System Engineer at SupportSages and although his skill set is vast, his greatest expertise revolve in the worlds of system administration.

You may also read:

Comments

Add new commentSIGN IN

How to setup a WordPress website on a freshly provisioned VPS with ISPmanager control panel

Azure VmScaleset Alert Configuration

EC2 Status check and restart using SSM runbook

Blog