Saturday, August 7, 2010

Nagios with check_mk

We don't do a lot of printing here at the homestead, but when we do we usually find that the printer isn't working. It is an older network printer that is a bit flaky. So I obviously needed to install Nagios an enterprise class monitoring package to let me know when it stopped working. I had a centOS machine lying around that I wasn't doing anything with, so one rainy Saturday afternoon I set to work. I followed the quick start instructions for fedora without encountering much trouble. I had a couple dependency issues (no apache) but was able to get everything through yum easily enough...as an aside yum has really made life easier; I remember the days where if you forgot to select gcc during the install you simply reformatted and started over because it was easier.

I also wanted to be able to do some trending (not on the printer mind you that would be silly). It seems like some people are using different products for trending and just Nagios for the monitoring; but Nagios does a great job of trending as well with help from an add-on. Of the various add-ons for Nagios that do trending pnp4nagio seemed to fit me best, it is easy to use and generates appealing graphs. As long as the Nagios check returns data properly pnp4nagios can graph it with no additional setup.

So it was time to setup my first host. This is where I think people start to get turned off by Nagios. For every host you have to edit a file with all the services the host has. This means thinking about what you want to monitor, having some idea of what Nagios can monitor, and when adding new monitors going back and updating potentially many config files. There are some plugins that will let you do this in a web GUI but I didn't try any so cannot speak to them. Despite the difficult config Nagios is nice; the web interface is clear, I could see that my file server and printer were both up, and I got nice alerts when my printer went offline.

I then set out to investigate check_mk which solves the difficult configuration of the monitored hosts which is Nagios's biggest negative. Check_mk is an auto discovery tool for services on a host. You tell it what host machines you want to monitor, install a small client on the host and it takes care of the rest. Anything check_mk can monitor it will find it on the host and monitor it; it also has built in integration to pnp4nagios so you can trend all of the check_mk monitors as well. You will still need to deal with figuring out how to monitor other stuff that is not at the OS layer (custom applications, database, etc) but you will be building off a solid foundation.

Installation of check_mk was very simple NOTE You will probably want to grab the latest version as the new 1.1.6 is out now (I admit I procrastinated a while before writing this post but the steps should otherwise be the same):
wget http://mathias-kettner.de/download/check_mk-1.1.2.tar.gz tar -xvf check_mk-1.1.2.tar.gz
cd check_mk-1.1.2
./setup.sh
When you run the setup script it asked me lots and lots of questions, I took the defaults for all of them. To install the client on the target machine to be monitored:
wget http://mathias-kettner.de/download/check_mk-agent-1.1.2-1.noarch.rpm
rpm -i check_mk-agent-1.1.2-1.noarch.rpm
Next add the host name of the machine to be monitored to the check_mk config file (/etc/check_mk/main.mk), and run two commands to auto-generate the Nagios host config files:
check_mk -I alltcp
check_mk -R
These commands will also restart the Nagios server. One note is that if you are running on a slow host (like hardware sitting in your basement) sometimes 2 Nagios processes will be running at the same time, kill them both (killall -9 nagios) and restart nagios (service nagios start) and you should be fine.

Browse over to the web UI and you should see your monitored host, click on it and you should see a healthy amount of monitored services. To add additional hosts simply repeat the last two steps from above.

With the combination of Nagios and chck_mk you get the great monitoring server of Nagios without the headache or learning curve that is the traditional complaint of Nagios users. Skip the NRPE or remote shell invocation stuff and go straight to check_mk. You can add non check_mk services to these hosts (and other things) through standard Nagios configuration files. You will want to add a new "cfg_file" property to your nagios.cfg file to hold custom configurations, in this new file define a new check using the same host name used to setup check_mk. When done you should see your service along with the check_mk ones, and because it is in a separate file check_mk will not overwrite it when doing updates. Nagios may also be a bit chatty in the beginning, so even with check_mk it still takes a little tuning so alerts are not going off all the time; though you may also be surprised to discover a number of problems on your network if this is the first time you are setting up a non-home grown monitoring solution.

For additional reading check out the jboss2nagios plugin for JBoss monitoring and icinga a recent fork of Nagios. I didn't get a chance to look into icinga much as I didn't find it until I was already down the path with Nagios. Initial thoughts were the web UI looks much more modern but I don't think check_mk works with it yet.