The NRPE Protocol explained

The ubiquitous Nagios Remote Plugin Executor protocol is a means to monitor local facilities of remote hosts by aggregating Nagios plugin execution results transported over a – to Nagios and its offsprings – semi-proprietary TCP/IP-based protocol. In this blogpost I intend to demystify its contents and explain how it works. The most widely used implementation of this protocol is shipped with Nagios as a C binary called check_nrpe and nagios-nrpe-server or nrped. This document is the result of an attempt to write an implementation of the protocol in pure Perl, which can be found on MetaCPAN or checked out using git via Github.

Note: This post will only explain the specifics of the Nagios NRPE Protocol Version 3 as it is the most common protocol version installed and used with recent installations of Nagios and its successors.

Basics

NRPE is build upon a client server protocol where a remote server we intend to monitor listens on a specific port for TCP/IP connections requesting information on a specific check defined in the NRPE daemons configuration file. That would be your check_disk or check_mysql on any given *NIX server. For example:

command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10

defines a check wherein a Nagios check plugin (here check_users) is executed and returns the result of the execution in the expected Nagios textformat. If you would now confiugure your nagios-nrpe daemon this way you would be able to check for the currently logged in users on the given system the nrpe-daemon is running on. A typical result looks like this:

nagios:~# check_nrpe -H somehost.example.com -c check_users
USERS OK - 0 users currently logged in |users=0;5;10;0

The Protocol

Communication using the NRPE protocol is a Request/Response-type mechanism where the check_nrpe client application requests a check to be executed (in NRPE-lingo a “Query”) and the result sent to it in Response.

 .------------.                      .-------.
 |            |---Query-Packet------>|       |
 | check_nrpe |                      | nrped |
 |____________|<--Response-Packet----|_______|

A packet carrying the NRPE payload has a fixed layout where in order of occurence in the packet the following fields are defined:

  1. [2 Byte int16_t] – Version number
  2. [2 Byte int16_t] – Type (Query/Response)
  3. [4 Byte u_int32_t] – CRC32 Checksum
  4. [2 Byte int16_t] – result code (OK, WARNING, ERROR, UNKNOWN)
  5. [1024 Byte char] Buffer

Default values for number 1,2 and 4 of the list can be taken from the C source codes common.h header file where:

  • Version number denotes which version of the NRPE protocol we are speaking e.g.: 1/2/3
  • Type is defined by wether we are querying for (1) or responding to a request (2)
  • Results are one of the 4 standard Nagios check results OK (0), WARNING (1), CRITICAL (2), UNKNOWN (3)

The buffer is the part that is actually interesting, as it is here where we decide which check we would like to let the server carry out for us. If the check we write into the buffer does not exist or is not configured on server-side we will get a response of CRITICAL.

nagios:~# check_nrpe -H somehost.example.com -c check_test_fail
NRPE: Command 'check_test_fail' not defined
nagios:~#echo $?
2
nagios:~#

In the response packet we would now see a type 2 (response) the result code we will exit with (2) and the response to print to the screen “Command ‘check_test_fail’ not defined”.

Should the Output of a single check request exceed the 1024 char limit the response is split up in as many packets necessary and then sent to the client.

Implications

Now that we know how the NRPE Protocol works and how to interact with it we could write our own daemon that integrates with our specific business software and lets us gather data about ongoing operations in our system more easily than through the existing techniques.

Consider this piece of code for a moment using the aforementioned Nagios::NRPE library:

#!/usr/bin/env perl

use Nagios::NRPE::Daemon;
use Nagios::NRPE::Packet;
use threads;

my $daemon = Nagios::NRPE::Daemon->new(listen => "127.0.0.1",
                                       port   => 5666,
                                       pid_dir => "/path/to/pidfile",
                                       ssl => 1,
                                       commandlist => ["check_this","check_that"],
                                       callback => sub {
                                         my ($self,$check,@options) = @_;
                                         my $commandlist = $self->commandlist();
                                         if ($commandlist->{$check}) {
                                           # Do something according to check
                                         }
                                       }
                                      );


threads->new($daemon->start());

We could now integrate a complete NRPE and Nagios monitoring with our existing framework of perl and business logic and use the existing notification and graphing facilities built by the Nagios community for whatever needs monitoring.

Thanks for reading and stay tuned.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: