This document describes the core module additions to NMIS to add
additional SNMP var poll and reporting features.
As an example, I chose temperature monitoring on a Dell Server.
Step 1
Locate the MIB !
I found the values required by adding all the Dell server MIBs' to
'getif' ( a common SNMP utility) and walking the management tree.
After finding the value and strings required, I searched the Dell
MIB file and matched the values I wanted to "dellserv.mib".
I then wrote a short perl
script based on /nmis/bin/mibdump.pl to parse this MIB and create the
'dell.oid' name to oid mapping table that the Snmp_Simple routines in NMIS
require. I have all my MIBS in /nmis/newmibs, to keep them separate from the
files supplied with nmis in /nmis/mibs/
/nmis/bin/mib2oid.pl
mib=dellserv.mib outfile=dell.oid mibdir=/nmis/newmibs/
#
mib2oid.pl
#!/usr/bin/perl
#
Auto configure to the <nmis-base>/lib and
<nmis-base>/files/nmis.conf
use
FindBin;
use
lib "$FindBin::Bin/../lib";
#
Check if there are any arguements
if
( $#ARGV < 0 ) {
print "$0 NMIS MIB Dumping
Tool\n";
print "command line options
are:\n";
print
"\tmibdir=<directory>
Location of the MIBS\n";
print "\tmib=<mibfile> Which MIB to load\n";
print "\toutfile=<filename>] Where to dump the OID to.\n";
print "\n";
exit(1);
}
use
strict;
use
SNMP_MIB;
use
func;
use
NMIS;
my
%ARGS = getArguements(@ARGV);
my
$conf;
if
( $ARGS{file} ne "" ) { $conf = $ARGS{file}; }
else
{ $conf = "nmis.conf"; }
my
$configfile = "$FindBin::Bin/../conf/$conf";
if
( -f $configfile ) { loadConfiguration($configfile); }
else
{ die "Can't access configuration file $configfile.\n"; }
if
( ! defined $ARGS{mibdir} ) { $ARGS{mibdir} =
"/usr/local/share/snmp/mibs/"; }
#
Always load the default mibs from the master mib dir
SNMP_MIB::loadmib($ARGS{mibdir},
"RFC1213-MIB.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"IF-MIB-V1SMI.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"IANAifType-MIB-V1SMI.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"ETHERLIKE-MIB.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"RFC1315-MIB.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"ENTITY-MIB-V1SMI.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"SNMPv2-SMI-V1SMI.txt");
SNMP_MIB::loadmib($ARGS{mibdir},
"SNMPv2-MIB-V1SMI.txt");
#
expect fully pathed directory here.
SNMP_MIB::loadmib($ARGS{mib});
#
current dir here
open
(STDOUT,">$ARGS{outfile}") or die "ERROR, problem opening
$ARGS{outfile}. $!\n";
SNMP_MIB::dump_oids_file();
close
(STDOUT);
The relevant bits in the dell.oid file should look like this.
You could hand create this file if desperate, but be careful, the
number of tabs is significant.
"tempIndex" "1.3.6.1.4.1.674.10891.300.1"
"tempIndexAtt2" "1.3.6.1.4.1.674.10891.300.1.2"
"tempTypeAtt3" "1.3.6.1.4.1.674.10891.300.1.3"
"tempStatusAtt4" "1.3.6.1.4.1.674.10891.300.1.4"
"tempReadingAtt5" "1.3.6.1.4.1.674.10891.300.1.5"
"tempMinWarnAtt6" "1.3.6.1.4.1.674.10891.300.1.6"
"tempMaxWarnAtt7" "1.3.6.1.4.1.674.10891.300.1.7"
"tempMinFailAtt8" "1.3.6.1.4.1.674.10891.300.1.8"
"tempMaxFailAtt9" "1.3.6.1.4.1.674.10891.300.1.9"
"tempLocationAtt10" "1.3.6.1.4.1.674.10891.300.1.10"
Step 2
Copy or move the dell.oid file just created to the nmis/mibs/
directory.
In /nmis/bin/nmis.pl, copy the oid load file code at about line
#230 and read in the new dell.oid file.
I created a new config variable in /nmis/conf/nmis.conf and used
that to reference the file.
If you are loading a number of different mibs, then either dump
them all to one file using mibdump.pl, or change the loadoids_file to the file list
version – see Snmp_Simple for the syntax.
In /nmis/conf/nmis.conf
dell_mib=dell.oid
In /nmis/bin/nmis.pl, line #230 or so.
#
if
($debug > 2) { print "\tLoading $NMIS::config{mib_root}/$NMIS::config{dell_mib}\n";
}
if
( -r "$NMIS::config{mib_root}/$NMIS::config{dell_mib}" ) {
SNMP_MIB::loadoids_file("$NMIS::config{mib_root}",
"$NMIS::config{dell_mib}");
}
else
{ warn returnTime." nmis.pl, mib file $NMIS::config{mib_root}/$NMIS::config{dell_mib}
not found.\n"; }
#
Step 3
Choose a metric type to save and display the variables in. I
strongly suggest ‘health” for all snmp data collection over and above the
standard MIBII set.
If you are collecting additional interface vars such as CRC, then
adding these to the existing interface rrd is an easier way to go.
Sub runHealth is run for all nodes that have collect=true, and
respond to a ping, and don't report a snmp error on a snmp system poll.
The nodeHealth rrd files are labelled by nodename, and saved as
/nmis/database/health/server/myservername-health.rrd
This filename is determined by sub getRRDFileName - see step 7.
We will also display the server temperature graph in the node
health section of the web interface.
Step 4
Copy a sample snmp poll subroutine in nmis.pl and modify to
collect the snmp values that you want. The values must be referenced by name,
and be mapped to the ASN.1 numbers in an xxx.oid file.
To make sure we are collecting from the Dell servers only, I
matched on nodetype="server" and sysName="myservername", as
there is no way of telling from MIBII on a server, which type of server we
might be polling.
I also extended the snmp variable name with the ASN.1 numbers
.9.2.1.1, as this is what the initial testing with "getif" reported.
In /nmis/bin/nmis.pl, about line #1280 or so, in sub runHealth.
Add another elsif {......} section, following the general syntax
of existing sections.
###
DellServer
elsif
( $NMIS::systemTable{nodeType} eq "server" and
$NMIS::systemTable{sysName} eq "MyServerName" ) {
#
Get the dell temperature readings into %snmpTable.
#
can choose our own variable names, I suggest to follow the snmp name for easy
debugging
#
Could just as easily bulkwalk the table, quicker network time, left as
individual poll here as an example.
#
check out the Snmp_Simple routines for the bulkwalk call syntax.
if
($collect eq "true") {
( $snmpTable{tempStatus},
$snmpTable{tempReading},
$snmpTable{tempMinWarn},
$snmpTable{tempMaxWarn}
)
= $session->snmpget(
'tempStatusAtt4'.".9.2.1.1",
'tempReadingAtt5'.".9.2.1.1",
'tempMinWarnAtt6'.".9.2.1.1",
'tempMaxWarnAtt7'.".9.2.1.1"
);
#
standard error handler
if
( $SNMP_Simple::errmsg =~ /No answer from/ ) {
$message
= "$node SNMP error. errmsg=$SNMP_Simple::errmsg";
$SNMP_Simple::errmsg
= "";
logMessage("runHealth,
$message");
if
($debug) { print returnTime." runHealth, $message\n"; }
$SNMP_Simple::errmsg
= "";
goto
END_runHealth;
}
#
lets do some post processing on what we got.
#
change the collected values to something readable.
$snmpTable{tempStatus}
= $snmpTable{tempStatus} == 3 ? "ok" : "fail";
#
system min/max seem too high for safety
#
so lets drop the system max and min values a bit.
$snmpTable{tempMinWarn}
= 100;
$snmpTable{tempMaxWarn}
= 450;
#
standard debug - print the hash to confirm what we got.
if
($debug) {
print
returnTime." Health Stats Summary\n";
for
$index ( sort keys %snmpTable ) {
print
"\t$index=$snmpTable{$index}\n";
}
}
#
more post processing - raise an event if temperature outside bounds
#
log an event if the temperature is not "ok"
#
treat as a Node Down event for event escalation purposes - simplifies things.
if
( $snmpTable{tempStatus} ne "ok" or
$snmpTable{tempReading}
> $snmpTable{tempMaxWarn} or
$snmpTable{tempReading}
< $snmpTable{tempMinWarn} ) {
#
Device is hot or maybe cold
#
standard debug - report if outside system bounds.
if
($debug) { print returnTime." Temperature failed $node Temp is:
$snmpTable{tempReading}\n"; }
#
event notify - turn event subsystem on using notify.
notify(node
=> $node, role => $NMIS::systemTable{roleType}, type =>
$NMIS::systemTable{nodeType}, event => "Node Down", details =>
"Temperature Exceeded");
}
else {
#
Device is OK
#
clear any events as system temperature now OK using checkEvent.
checkEvent(node
=> $node, role => $NMIS::systemTable{roleType}, type =>
$NMIS::systemTable{nodeType}, event => "Node Down", details =>
"Temperature Exceeded");
}
#
create and update the rrd for display purposes.
#
Check if the RRD Database Exists, create if not, and update it.
if
( &createRRDDB(type => "nodehealth", node => $node,
nodeType => $NMIS::nodeTable{$node}{devicetype}) ) {
&updateRRDDB(type
=> "nodehealth", node => $node, nodeType =>
$NMIS::nodeTable{$node}{devicetype});
}
}
# collect eq true
}
# nodeModel eq Dellserver
#
all done - /nmis/database/health/server/myservername-health.rrd should be
updated with chosen values from %snmpTable - see Step 5.
Step 5
Update the rrd file.
Match on same as before - nodehealth and type=server.
I chose to save the actual reading, and the min/max values for
display.
The options are standard rrd db syntax.
In /nmis/bin/nmis.pl, add another section about #2780 in sub
updateRRDDB
#
server temperature
elsif
( $type eq "nodehealth" and $NMIS::systemTable{nodeType} eq
"server" ) {
#
set the rrd @options array up.
@options
= (
"-t",
"tempReading:tempMinWarn:tempMaxWarn",
"N:$snmpTable{tempReading}:$snmpTable{tempMinWarn}:$snmpTable{tempMaxWarn}"
);
#
standard debug - print the option lines in a sensible format
if
($debug) {
@label
= split /:/, $options[1];
@value
= split /:/, $options[2];
print
" database=$database\n\t";
for
( $i=0; $i < @label; $i++ ) {
print
" $label[$i]=$value[$i+1]";
}
print
"\n";
}
}
# endif - the RRDs:: fragment will actually write it out.
Step 6
Create the rrd file if it does not exist - very likely first time
thru !
Set the number of points saved and the averaging points, standard
rrd syntax.
In /nmis/bin/nmis.pl, about #2950, in sub createRRDDB
#
server temperature
elsif
( $NMIS::systemTable{nodeType} eq "server" and $type eq
"nodehealth" ) {
@options
= (
"-b",
$START, "-s", 300,
"DS:tempReading:GAUGE:900:0:900",
"DS:tempMinWarn:GAUGE:900:0:900",
"DS:tempMaxWarn:GAUGE:900:0:900",
"RRA:AVERAGE:0.5:1:2304",
"RRA:AVERAGE:0.5:6:1536",
"RRA:AVERAGE:0.5:24:2268",
"RRA:AVERAGE:0.5:288:1890",
"RRA:MAX:0.5:1:2304",
"RRA:MAX:0.5:6:1536",
"RRA:MAX:0.5:24:2268",
"RRA:MAX:0.5:288:1890",
"RRA:MIN:0.5:1:2304",
"RRA:MIN:0.5:6:1536",
"RRA:MIN:0.5:24:2268",
"RRA:MIN:0.5:288:1890"
);
}
# end server temperature
NB - at about this point, you may be wondering what to do when you
wish to collect additional server health stats ?? Just extend the code as shown
earlier and add in the additional snmp vars, and save away in the same rrd
file. No need to create another rrd file.
Step 7
We are now finished with nmis.pl.
We need to choose a graphtype for use in nmiscgi.pl, so we have
something to match on for the nmiscgi.pl print graph routines.
I chose graphtype="degree"
In NMIS.pm, sub getRRDFileName calls sub getGraphType , which sets
the filename and directory based on metric type. Recall, that in nmis.pl,
type='nodehealth', will return 'health'
which sets the directory/filename. In nmiscgi.pl, graphtype='degree', which
will return 'health' which sets the same directory/filename.
Note that in nmis.pl, we save all the extra snmp vars, over and
above the standard MIBII set, in the type=’health’ sections, in sub runHealth.
In the web display code in nmiscgi.pl, we break out these vars,
all from the same rrd, into separate graphs, hence the different
graphtype=’degree’.
For example, for the CiscoRouter, we save cpu, memory and buffer
stats in the same nodehealth rrd, but create separate graphs for each of these
groups, in nmiscgi,pl, and use graphtype=’cpu’,’mem’,’buffer’ for example, to
control what is going on.
The key code fragment that ties the rrd files, directories,
collected snmp vars and graphs together is /nmis/lib/NMIS.pm sub
getRRDFileName.
If this is not set right, the wrong or no rrd database file will
be returned.
In nmis/lib/NMIS.pm, about line #1840, modify sub getGraphType to
match on ‘degree’ and therefore return 'health' when called for type=nodehealth
or type=degree, which in getRRDFileName will return the desired
“/nmis/database/health/server/myservername-health.rrd”
#
elsif
( $graphtype =~ /nodehealth|cpu|mem|traffic|topo|buffer|pix-conn|a3|degree/ ) {
$type
= "health";
}
#
Step 8
Set the graph heading in nmiscgi.pl
In /nmis/cgi-bin/nmiscgi.pl, about line #604 in sub graphHeading
###
server temperature
elsif
( $graphtype eq "degree" ) { $heading = "$node Server
Temperature"; }
#
Step 9
Print the graph as part of the server health web display.
This matches on nodeType=server and prints the Temperature graph,
following the standard "Reachability, Availability and Health" and
"Response Time" graphs.
These lines are complex, best to copy a previous section and
change the relevant display name and graphtype bits like this:
name=”DEGREE”, #TOP”Temperature”, graphtype=degree,
window.status='Drill into Server Temperature.’, alt=”Server Temperature”,
graph=degree.
Note that “type=drawgraph graph=degree” draws the rrd in the
nodehealth display, and the href tag “type=graph graphtype=degree” fetches the
clickable graph window. ( I know that’s confusing J)
In /nmis/cgi-bin/nmiscgi.pl, about line # 2150 in sub typeHealth
###
server temperature
if
( $NMIS::nodeTable{$node}{collect} eq "true" and
$NMIS::systemTable{nodeType} eq "server" ) {
print
<<EO_HTML;
<tr>
<td
align="center" bgcolor="white"><A
name="DEGREE"><b>
<a
href="#TOP">Temperature</a><BR>
<a
href="$ENV{SCRIPT_NAME}?file=$conf&type=graph&graphtype=degree&glamount=$glamount&glunits=$glunits&node=$node"
target=ViewWindow
onMouseOver="window.status='Drill into Server Temperature.';return
true" onClick="viewdoc('$tmpurl',$win_width,$win_height)">
<img
border=\"0\" alt="Server Temperature"
src="$ENV{SCRIPT_NAME}?file=$conf&type=drawgraph&node=$node&graph=degree&glamount=$glamount&glunits=$glunits&start=0&end=0&width=500&height=100">
</a>
</b>
</td>
</tr>
Step 10
Setup the graph print parameters for the rrd graph print.
Match on type='degree', so the correct pathname/filename is
chosen.
I chose to display the actual reading, and the max/min values that
I stored in the rrd, and set an average for the reading, and a max value. I
also used CDEF to divide the values by 10, to set the decimal point right.
In /nmis/cgi-bin/nmiscgi.pl, about line #5000, in sub rrdDraw
###
server temperature
elsif
( $type eq "degree" ) {
@options
= (
"--title",
"$node - $length from $datestamp_start to $datestamp_end",
"--vertical-label",
'Server Temperature',
"--start",
"$start",
"--end",
"$end",
"--width",
"$width",
"--height",
"$height",
"--imgformat",
"PNG",
"--interlace",
"DEF:tempReading=$database:tempReading:AVERAGE",
"DEF:tempMinWarn=$database:tempMinWarn:AVERAGE",
"DEF:tempMaxWarn=$database:tempMaxWarn:AVERAGE",
"CDEF:xtempReading=tempReading,10,/",
"CDEF:xtempMinWarn=tempMinWarn,10,/",
"CDEF:xtempMaxWarn=tempMaxWarn,10,/",
"LINE2:xtempReading#00ff00:Avg
Temp",
"LINE1:xtempMinWarn#0000ff:Min
Alarm Temp",
"LINE1:xtempMaxWarn#ff0000:Max
Alarm Temp",
"GPRINT:xtempReading:AVERAGE:Avg
Temp %1.2lf",
"GPRINT:xtempReading:MAX:Max
Temp %1.2lf",
);
}
Step 11
There is no Step 11.
Note that we didn't have to add anything to sub typeGraph (
type=graph) as we went for the default health, response time, export and stats
menu. Other implementations may well require similar routines to what is
already there, so the clickable graphs display. The PIX connections is a good
model as to what may be required.
A nice addition would be add type=degree to the drop down menu in
sub graphMenu.
Eric Greenwood
3 Feb 2003