This document describes the core module additions to NMIS to add additional SNMP value poll and reporting features

This document describes the core module additions to NMIS to add additional SNMP var poll and reporting features.

As an example, I chose temperature monitoring on a Dell Server.

Step 1

Locate the MIB !

I found the values required by adding all the Dell server MIBs' to 'getif' ( a common SNMP utility) and walking the management tree.

After finding the value and strings required, I searched the Dell MIB file and matched the values I wanted to "dellserv.mib".

I then wrote a short perl script based on /nmis/bin/mibdump.pl to parse this MIB and create the 'dell.oid' name to oid mapping table that the Snmp_Simple routines in NMIS require. I have all my MIBS in /nmis/newmibs, to keep them separate from the files supplied with nmis in /nmis/mibs/

/nmis/bin/mib2oid.pl mib=dellserv.mib outfile=dell.oid mibdir=/nmis/newmibs/

# mib2oid.pl

#!/usr/bin/perl

# Auto configure to the <nmis-base>/lib and <nmis-base>/files/nmis.conf

use FindBin;

use lib "$FindBin::Bin/../lib";

# Check if there are any arguements

if ( $#ARGV < 0 ) {

print "$0 NMIS MIB Dumping Tool\n";

print "command line options are:\n";

print "\tmibdir=<directory> Location of the MIBS\n";

print "\tmib=<mibfile> Which MIB to load\n";

print "\toutfile=<filename>] Where to dump the OID to.\n";

print "\n";

exit(1);

}

use strict;

use SNMP_MIB;

use func;

use NMIS;

my %ARGS = getArguements(@ARGV);

my $conf;

if ( $ARGS{file} ne "" ) { $conf = $ARGS{file}; }

else { $conf = "nmis.conf"; }

my $configfile = "$FindBin::Bin/../conf/$conf";

if ( -f $configfile ) { loadConfiguration($configfile); }

else { die "Can't access configuration file $configfile.\n"; }

if ( ! defined $ARGS{mibdir} ) { $ARGS{mibdir} = "/usr/local/share/snmp/mibs/"; }

# Always load the default mibs from the master mib dir

SNMP_MIB::loadmib($ARGS{mibdir}, "RFC1213-MIB.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "IF-MIB-V1SMI.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "IANAifType-MIB-V1SMI.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "ETHERLIKE-MIB.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "RFC1315-MIB.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "ENTITY-MIB-V1SMI.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "SNMPv2-SMI-V1SMI.txt");

SNMP_MIB::loadmib($ARGS{mibdir}, "SNMPv2-MIB-V1SMI.txt");

# expect fully pathed directory here.

SNMP_MIB::loadmib($ARGS{mib});

# current dir here

open (STDOUT,">$ARGS{outfile}") or die "ERROR, problem opening $ARGS{outfile}. $!\n";

SNMP_MIB::dump_oids_file();

close (STDOUT);

The relevant bits in the dell.oid file should look like this.

You could hand create this file if desperate, but be careful, the number of tabs is significant.

"tempIndex" "1.3.6.1.4.1.674.10891.300.1"

"tempIndexAtt2" "1.3.6.1.4.1.674.10891.300.1.2"

"tempTypeAtt3" "1.3.6.1.4.1.674.10891.300.1.3"

"tempStatusAtt4" "1.3.6.1.4.1.674.10891.300.1.4"

"tempReadingAtt5" "1.3.6.1.4.1.674.10891.300.1.5"

"tempMinWarnAtt6" "1.3.6.1.4.1.674.10891.300.1.6"

"tempMaxWarnAtt7" "1.3.6.1.4.1.674.10891.300.1.7"

"tempMinFailAtt8" "1.3.6.1.4.1.674.10891.300.1.8"

"tempMaxFailAtt9" "1.3.6.1.4.1.674.10891.300.1.9"

"tempLocationAtt10" "1.3.6.1.4.1.674.10891.300.1.10"

Step 2

Copy or move the dell.oid file just created to the nmis/mibs/ directory.

In /nmis/bin/nmis.pl, copy the oid load file code at about line #230 and read in the new dell.oid file.

I created a new config variable in /nmis/conf/nmis.conf and used that to reference the file.

If you are loading a number of different mibs, then either dump them all to one file using mibdump.pl, or change the loadoids_file to the file list version – see Snmp_Simple for the syntax.

In /nmis/conf/nmis.conf

dell_mib=dell.oid

In /nmis/bin/nmis.pl, line #230 or so.

if ($debug > 2) { print "\tLoading $NMIS::config{mib_root}/$NMIS::config{dell_mib}\n"; }

if ( -r "$NMIS::config{mib_root}/$NMIS::config{dell_mib}" ) {

SNMP_MIB::loadoids_file("$NMIS::config{mib_root}", "$NMIS::config{dell_mib}");

}

else { warn returnTime." nmis.pl, mib file $NMIS::config{mib_root}/$NMIS::config{dell_mib} not found.\n"; }

Step 3

Choose a metric type to save and display the variables in. I strongly suggest ‘health” for all snmp data collection over and above the standard MIBII set.

If you are collecting additional interface vars such as CRC, then adding these to the existing interface rrd is an easier way to go.

Sub runHealth is run for all nodes that have collect=true, and respond to a ping, and don't report a snmp error on a snmp system poll.

The nodeHealth rrd files are labelled by nodename, and saved as /nmis/database/health/server/myservername-health.rrd

This filename is determined by sub getRRDFileName - see step 7.

We will also display the server temperature graph in the node health section of the web interface.

Step 4

Copy a sample snmp poll subroutine in nmis.pl and modify to collect the snmp values that you want. The values must be referenced by name, and be mapped to the ASN.1 numbers in an xxx.oid file.

To make sure we are collecting from the Dell servers only, I matched on nodetype="server" and sysName="myservername", as there is no way of telling from MIBII on a server, which type of server we might be polling.

I also extended the snmp variable name with the ASN.1 numbers .9.2.1.1, as this is what the initial testing with "getif" reported.

In /nmis/bin/nmis.pl, about line #1280 or so, in sub runHealth.

Add another elsif {......} section, following the general syntax of existing sections.

### DellServer

elsif ( $NMIS::systemTable{nodeType} eq "server" and $NMIS::systemTable{sysName} eq "MyServerName" ) {

# Get the dell temperature readings into %snmpTable.

# can choose our own variable names, I suggest to follow the snmp name for easy debugging

# Could just as easily bulkwalk the table, quicker network time, left as individual poll here as an example.

# check out the Snmp_Simple routines for the bulkwalk call syntax.

if ($collect eq "true") {

( $snmpTable{tempStatus},

$snmpTable{tempReading},

$snmpTable{tempMinWarn},

$snmpTable{tempMaxWarn}

) = $session->snmpget(

'tempStatusAtt4'.".9.2.1.1",

'tempReadingAtt5'.".9.2.1.1",

'tempMinWarnAtt6'.".9.2.1.1",

'tempMaxWarnAtt7'.".9.2.1.1"

);

# standard error handler

if ( $SNMP_Simple::errmsg =~ /No answer from/ ) {

$message = "$node SNMP error. errmsg=$SNMP_Simple::errmsg";

$SNMP_Simple::errmsg = "";

logMessage("runHealth, $message");

if ($debug) { print returnTime." runHealth, $message\n"; }

$SNMP_Simple::errmsg = "";

goto END_runHealth;

}

# lets do some post processing on what we got.

# change the collected values to something readable.

$snmpTable{tempStatus} = $snmpTable{tempStatus} == 3 ? "ok" : "fail";

# system min/max seem too high for safety

# so lets drop the system max and min values a bit.

$snmpTable{tempMinWarn} = 100;

$snmpTable{tempMaxWarn} = 450;

# standard debug - print the hash to confirm what we got.

if ($debug) {

print returnTime." Health Stats Summary\n";

for $index ( sort keys %snmpTable ) {

print "\t$index=$snmpTable{$index}\n";

}

# more post processing - raise an event if temperature outside bounds

# log an event if the temperature is not "ok"

# treat as a Node Down event for event escalation purposes - simplifies things.

if ( $snmpTable{tempStatus} ne "ok" or

$snmpTable{tempReading} > $snmpTable{tempMaxWarn} or

$snmpTable{tempReading} < $snmpTable{tempMinWarn} ) {

# Device is hot or maybe cold

# standard debug - report if outside system bounds.

if ($debug) { print returnTime." Temperature failed $node Temp is: $snmpTable{tempReading}\n"; }

# event notify - turn event subsystem on using notify.

notify(node => $node, role => $NMIS::systemTable{roleType}, type => $NMIS::systemTable{nodeType}, event => "Node Down", details => "Temperature Exceeded");

} else {

# Device is OK

# clear any events as system temperature now OK using checkEvent.

checkEvent(node => $node, role => $NMIS::systemTable{roleType}, type => $NMIS::systemTable{nodeType}, event => "Node Down", details => "Temperature Exceeded");

}

# create and update the rrd for display purposes.

# Check if the RRD Database Exists, create if not, and update it.

if ( &createRRDDB(type => "nodehealth", node => $node, nodeType => $NMIS::nodeTable{$node}{devicetype}) ) {

&updateRRDDB(type => "nodehealth", node => $node, nodeType => $NMIS::nodeTable{$node}{devicetype});

}

} # collect eq true

} # nodeModel eq Dellserver

# all done - /nmis/database/health/server/myservername-health.rrd should be updated with chosen values from %snmpTable - see Step 5.

Step 5

Update the rrd file.

Match on same as before - nodehealth and type=server.

I chose to save the actual reading, and the min/max values for display.

The options are standard rrd db syntax.

In /nmis/bin/nmis.pl, add another section about #2780 in sub updateRRDDB

# server temperature

elsif ( $type eq "nodehealth" and $NMIS::systemTable{nodeType} eq "server" ) {

# set the rrd @options array up.

@options = (

"-t", "tempReading:tempMinWarn:tempMaxWarn",

"N:$snmpTable{tempReading}:$snmpTable{tempMinWarn}:$snmpTable{tempMaxWarn}"

);

# standard debug - print the option lines in a sensible format

if ($debug) {

@label = split /:/, $options[1];

@value = split /:/, $options[2];

print " database=$database\n\t";

for ( $i=0; $i < @label; $i++ ) {

print " $label[$i]=$value[$i+1]";

}

print "\n";

}

} # endif - the RRDs:: fragment will actually write it out.

Step 6

Create the rrd file if it does not exist - very likely first time thru !

Set the number of points saved and the averaging points, standard rrd syntax.

In /nmis/bin/nmis.pl, about #2950, in sub createRRDDB

# server temperature

elsif ( $NMIS::systemTable{nodeType} eq "server" and $type eq "nodehealth" ) {

@options = (

"-b", $START, "-s", 300,

"DS:tempReading:GAUGE:900:0:900",

"DS:tempMinWarn:GAUGE:900:0:900",

"DS:tempMaxWarn:GAUGE:900:0:900",

"RRA:AVERAGE:0.5:1:2304",

"RRA:AVERAGE:0.5:6:1536",

"RRA:AVERAGE:0.5:24:2268",

"RRA:AVERAGE:0.5:288:1890",

"RRA:MAX:0.5:1:2304",

"RRA:MAX:0.5:6:1536",

"RRA:MAX:0.5:24:2268",

"RRA:MAX:0.5:288:1890",

"RRA:MIN:0.5:1:2304",

"RRA:MIN:0.5:6:1536",

"RRA:MIN:0.5:24:2268",

"RRA:MIN:0.5:288:1890"

);

} # end server temperature

NB - at about this point, you may be wondering what to do when you wish to collect additional server health stats ?? Just extend the code as shown earlier and add in the additional snmp vars, and save away in the same rrd file. No need to create another rrd file.

Step 7

We are now finished with nmis.pl.

We need to choose a graphtype for use in nmiscgi.pl, so we have something to match on for the nmiscgi.pl print graph routines.

I chose graphtype="degree"

In NMIS.pm, sub getRRDFileName calls sub getGraphType , which sets the filename and directory based on metric type. Recall, that in nmis.pl, type='nodehealth', will return 'health' which sets the directory/filename. In nmiscgi.pl, graphtype='degree', which will return 'health' which sets the same directory/filename.

Note that in nmis.pl, we save all the extra snmp vars, over and above the standard MIBII set, in the type=’health’ sections, in sub runHealth.

In the web display code in nmiscgi.pl, we break out these vars, all from the same rrd, into separate graphs, hence the different graphtype=’degree’.

For example, for the CiscoRouter, we save cpu, memory and buffer stats in the same nodehealth rrd, but create separate graphs for each of these groups, in nmiscgi,pl, and use graphtype=’cpu’,’mem’,’buffer’ for example, to control what is going on.

The key code fragment that ties the rrd files, directories, collected snmp vars and graphs together is /nmis/lib/NMIS.pm sub getRRDFileName.

If this is not set right, the wrong or no rrd database file will be returned.

In nmis/lib/NMIS.pm, about line #1840, modify sub getGraphType to match on ‘degree’ and therefore return 'health' when called for type=nodehealth or type=degree, which in getRRDFileName will return the desired “/nmis/database/health/server/myservername-health.rrd”

elsif ( $graphtype =~ /nodehealth|cpu|mem|traffic|topo|buffer|pix-conn|a3|degree/ ) {

$type = "health";

}

Step 8

Set the graph heading in nmiscgi.pl

In /nmis/cgi-bin/nmiscgi.pl, about line #604 in sub graphHeading

### server temperature

elsif ( $graphtype eq "degree" ) { $heading = "$node Server Temperature"; }

Step 9

Print the graph as part of the server health web display.

This matches on nodeType=server and prints the Temperature graph, following the standard "Reachability, Availability and Health" and "Response Time" graphs.

These lines are complex, best to copy a previous section and change the relevant display name and graphtype bits like this:

name=”DEGREE”, #TOP”Temperature”, graphtype=degree, window.status='Drill into Server Temperature.’, alt=”Server Temperature”, graph=degree.

Note that “type=drawgraph graph=degree” draws the rrd in the nodehealth display, and the href tag “type=graph graphtype=degree” fetches the clickable graph window. ( I know that’s confusing J)

In /nmis/cgi-bin/nmiscgi.pl, about line # 2150 in sub typeHealth

### server temperature

if ( $NMIS::nodeTable{$node}{collect} eq "true" and $NMIS::systemTable{nodeType} eq "server" ) {

print <<EO_HTML;

<tr>

<a href="#TOP">Temperature</a><BR>

<a href="$ENV{SCRIPT_NAME}?file=$conf&type=graph&graphtype=degree&glamount=$glamount&glunits=$glunits&node=$node"

target=ViewWindow onMouseOver="window.status='Drill into Server Temperature.';return true" onClick="viewdoc('$tmpurl',$win_width,$win_height)">

</a>

</b>

</td>

</tr>

Step 10

Setup the graph print parameters for the rrd graph print.

Match on type='degree', so the correct pathname/filename is chosen.

I chose to display the actual reading, and the max/min values that I stored in the rrd, and set an average for the reading, and a max value. I also used CDEF to divide the values by 10, to set the decimal point right.

In /nmis/cgi-bin/nmiscgi.pl, about line #5000, in sub rrdDraw

### server temperature

elsif ( $type eq "degree" ) {

@options = (

"--title", "$node - $length from $datestamp_start to $datestamp_end",

"--vertical-label", 'Server Temperature',

"--start", "$start",

"--end", "$end",

"--width", "$width",

"--height", "$height",

"--imgformat", "PNG",

"--interlace",

"DEF:tempReading=$database:tempReading:AVERAGE",

"DEF:tempMinWarn=$database:tempMinWarn:AVERAGE",

"DEF:tempMaxWarn=$database:tempMaxWarn:AVERAGE",

"CDEF:xtempReading=tempReading,10,/",

"CDEF:xtempMinWarn=tempMinWarn,10,/",

"CDEF:xtempMaxWarn=tempMaxWarn,10,/",

"LINE2:xtempReading#00ff00:Avg Temp",

"LINE1:xtempMinWarn#0000ff:Min Alarm Temp",

"LINE1:xtempMaxWarn#ff0000:Max Alarm Temp",

"GPRINT:xtempReading:AVERAGE:Avg Temp %1.2lf",

"GPRINT:xtempReading:MAX:Max Temp %1.2lf",

);

}

Step 11

There is no Step 11.

Note that we didn't have to add anything to sub typeGraph ( type=graph) as we went for the default health, response time, export and stats menu. Other implementations may well require similar routines to what is already there, so the clickable graphs display. The PIX connections is a good model as to what may be required.

A nice addition would be add type=degree to the drop down menu in sub graphMenu.

Eric Greenwood

3 Feb 2003