Service Monitoring with Nagios

After setting up Nagios I had a network monitoring system, that could tell me when a host was down. While this is nice, it is not actually what I want to know. The goal of network monitoring is to determine, if all your services are running properly.To achieve that goal. I started to configure Nagios for service monitoring. Services are configured similar to hosts. You define the service in a config file with the define service{} command and a few parameters. I put the service definitions for each host in the cfg file for the host, that way I don’t clutter my Nagios directory too badly. These parameters are required to define a service:

host_name: the host name of the machine the service runs on. You could specify multiple hosts separated by a comma, or use multiple host_name lines in the host definition. However in many scenarios you will want to change some parameters for each host, so this might not make too much sense to use. Also if organized your cfg files like did ( one file for each host and all services on that host come in that file too), then there isn’t too much point in making use of this.

service_description: Here you can name the service, this acts pretty much like the alias in the host definition.

check_command: here you put the command used to check the service. You can define your own commands, but check commands for the most common services come with the standard plugins. Some commands accept arguments, multiple arguments are separated by exclamation marks.

max_check_attempts: same as in host definition.

check_interval: same as in host definition.

retry_interval: same as in host definition.

check_period: same as in host definition

notification_interval: same as in host definition

notification_options: this is very similar to the host definition, but a few of the options changed:

  • u — the service state is unknown, usually due to a configuration error in the check command or due to a faulty plugin.
  • w — the service is not OK but, it is not down. This can occur if the response time of the service is to high, or a disk in a disk check is almost full
  • c — the service is down or a critical condition is met. For example the return value of a temperature check is above the specified maximum.

contact_groups: same as in host definition.

Like host definition, the service definition also allows for the use of templates. This is a nice general template, that can be used for different checks:

#general service template
define service{
name default-service
notification_interval 720
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options w,u,c,r
contact_groups admins
register 0
}

You refer to a service template in the same way, you would refer to a host template.

define service{
# this will check if the your web server replies to http commands on port 80
host_name webserver
service_description HTTP on webserver
check_command check_http
use default-service
}

Some of the commands allow for Arguments to check on a different port for example, but sometimes the Nagios plugin allows for much more functionality then the predefined check command offers. In this case you will have to write your own check command. This is actually a pretty simple process.
First you need to open the plugin in a text editor. You will see which command is used for the service check and usually the command line switches it accepts are listed somewhere in here too. You will also see which check commands are already defined. If the command you need is already in there just look at the usage. If it is not defined, it is time to define your own check command.
To define a Check command you only need the following parameters:
command_name: short name for the command it is used to call the command with the check_command parameter in the service definition.
command_line: the command performed on the command line when the check command is called

There is nothing else necessary to define a command.
Here are a few example commands that I find useful. All of them are based on standard Nagios plugins.

# Check custom https url
# like https://www.example.com/example
define command{
command_name check_https_url_cust
command_line /usr/lib/nagios/plugins/check_http --ssl -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -u '$ARG1$'
}
# Check for a https site on a specific port
define command{
command_name check_https_port_cust
command_line /usr/lib/nagios/plugins/check_http --ssl -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -p '$ARG1$'
}
# check a custom https url and pass username and password for login
define command{
command_name check_https_url_auth_cust
command_line /usr/lib/nagios/plugins/check_http --ssl -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -u '$ARG1$' -a '$ARG2$'
}

These are not all check_http parameters, but those are the ones I needed. You could also check if the loaded website contains a certain expression. If you need to check normal http pages instead of https pages, simply remove “–ssl”.
Using one of those in the host definition, would look like this:

define service{
host_name www.example.com
service_description example subdirectory
check_command check_https_url_cust!https://www.example.com/example
use default-service
}

Some checks require usernames and passwords to work. You could directly state them in the service definition, or better you can put them in the resource.cfg. In the resource.cfg you can define up to 32 user macros. This file is not read by the webinterface, so it is a good idea to park your passwords here.
This is the resource.cfg, that comes with a fresh Nagios install. It is self explanatory.

#
# RESOURCE.CFG - Resource File for Nagios
#
# You can define $USERx$ macros in this file, which can in turn be used
# in command definitions in your host config file(s). $USERx$ macros are
# useful for storing sensitive information such as usernames, passwords,
# etc. They are also handy for specifying the path to plugins and
# event handlers - if you decide to move the plugins or event handlers to
# a different directory in the future, you can just update one or two
# $USERx$ macros, instead of modifying a lot of command definitions.
#
# The CGIs will not attempt to read the contents of resource files, so
# you can set restrictive permissions (600 or 660) on them.
#
# Nagios supports up to 32 $USERx$ macros ($USER1$ through $USER32$)
#
# Resource files may also be used to store configuration directives for
# external data sources like MySQL...
#
###########################################################################

# Sets $USER1$ to be the path to the plugins
$USER1$=/usr/lib/nagios/plugins

# Sets $USER2$ to be the path to event handlers
#$USER2$=/usr/lib/nagios/plugins/eventhandlers

# Store some usernames and passwords (hidden from the CGIs)

 

Leave a Reply

Your email address will not be published. Required fields are marked *