Blog of Robert Lacroix

robertlacroix.com - Robert Lacroix
Welcome to Blog of Robert Lacroix Sign in | Join | Help
in Search

robertlacroix.com - Robert Lacroix

check_ganglia, a rudimental approach to combine nagios with ganglia

When it comes to service monitoring, nagios is a pretty good tool, though it can only monitor services from outside. To add monitoring of metrics not exposed to the network, like disk usage, you need an agent running on the monitored server. nagios-statd, written as a python daemon, is such an agent.

From a management and security perspective it is good to keep the number of processes and open network ports as minimal as possible. Those that not only monitor service availability with nagios, but also resource usage with ganglia, already have the ganglia agent (gmond) running on each server. Adding nagios-statd would be the second agent that needs to be configured and kept up-to-date.

But what if you want to be alerted if one of the hard disks get full, load goes through the roof or the server begins using swap space? You would need nagios-statd although ganglia knows about everything. That's why I came up with the idea of using data from ganglia and monitor them with nagios and wrote a small php shellscript that accomplishes this task. There is for sure some work to do but I think it's a good start:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

42
43
44
45
46
47
48
49
50
51

52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#!/usr/bin/php
<?php
### Get command line arguments
$host = $argv[1];
$metric = $argv[2];
$metric_unit = $argv[3];
$cluster_arg = $argv[4]; #optional
$threshold_warn_arg = $argv[5]; #optional
$threshold_crit_arg = $argv[6]; #optional

### Fill Variables
if (!is_numeric($threshold_warn_arg))
  $threshold_warn = 2;
else
  $threshold_warn = $threshold_warn_arg;

if (!is_numeric($threshold_crit_arg))
  $threshold_crit = 1;
else
  $threshold_crit = $threshold_crit_arg;

if (!$cluster_arg)
  $cluster = "sk";
else
  $cluster = $cluster_args;

### Get data from gmond
$fp = fsockopen("localhost", 8649, $errno, $errstr, 30);
if (!$fp) {
  echo "GANGLIA Unknown - $errstr ($errno)\n";
  exit(3);
} else {
  while (!feof($fp)) {
    $buffer .= fgets($fp, 128);
  }
  fclose($fp);
}

### Get metric out of XML
$xmlobj = simplexml_load_string($buffer);
$metric_value = $xmlobj->xpath("/GANGLIA_XML/CLUSTER[@NAME='$cluster']/HOST[@NAME='$host']/
METRIC[@NAME='$metric']"
);

### Convert data (more types tbd)
$metric_value = $metric_value[0]->attributes();
if ($metric_value["TYPE"] == "double")
  $metric_value = doubleval($metric_value["VAL"]);
else
  $metric_value = $metric_value["VAL"];

### Build output strings
$perfcounter = $metric . "=" . $metric_value . $metric_unit . ";" . $threshold_warn
. $metric_unit . ";" . $threshold_crit . $metric_unit;
$text = $metric . " is " . $metric_value . $metric_unit;

### Output
if ($metric_value > $threshold_warn && $metric_value > $threshold_crit)
{
  print("GANGLIA OK - " . $text . " |" . $perfcounter . "\n");
  exit(0);
}

if ($metric_value < $threshold_warn && $metric_value > $threshold_crit)
{
  print("GANGLIA Warning - " . $text . " |" . $perfcounter . "\n");
  exit(1);
}

if ($metric_value < $threshold_warn && $metric_value < $threshold_crit)
{
  print("GANGLIA Critical - " . $text . " |" . $perfcounter . "\n");
  exit(2);
}

echo "GANGLIA Unknown\n";
exit(3);
?>
Save the script in your nagios-plugin directory (Debian defaults to /usr/lib/nagios/plugins) and make it executable (chmod +x check_ganglia). You can then define a new command in your nagios config as follows (in this example disk_free):

1
2
3
4
define command{
command_name check_ganglia_disk_free
command_line /usr/lib/nagios/plugins/check_ganglia $ARG1$ disk_free GB
}
After that, add a new service to your host config:

1
2
3
4
5
6
define service{
use generic-service
host_name xxx1
service_description DISK
check_command check_ganglia_disk_free!xxx1
}
 Restart Nagios and you should be able to monitor disk usage from then on.
Published Montag, 6. Oktober 2008 20:35 by rl

Comments

 

paulojr99 said:

Hello Robert, tried to run the plugin for ganglia.

However, I will return the following error:

/ usr / local / nagios / libexec #. / check_ganglia

Fatal error: Call to undefined function: simplexml_load_string () in / usr / local / nagios / libexec / check_ganglia on line 40

I would say what would be the problem?

Now, thanks.

November 17, 2008 13:34
 

rl said:

Hi paul,

do you run php5? simplexml may be new to php5.

/robert

November 21, 2008 10:04
Anonymous comments are disabled
Powered by Community Server (Personal Edition), by Telligent Systems