Using rrdtool to generate server load & bandwidth graphs

I’ve been using MRTG and routers2.cgi for years to graph the various aspects of a server that warrant monitoring. I’ve long known that they used something called rrdtool to do… well, something, but never had a need or desire to figure out exactly what that was.

But, having just moved my site to a new server, I was curious how the server would handle the load. Rather than setting up some behemoth like Nagios or Zabbix, which are full monitoring/alerting suites, I just wanted graphing. As I said, in the past I’ve used MRTG or routers2.cgi for this but both of them were overkill for me in this case. Since both of them used rrdtool, I figured that was a good place to look.

The two metrics I want to record are server load and in/out bandwidth. The first step is to create the RRDs (round robin databases). This was done via these commands:

# rrdtool create /mrtg/load.rrd --start N DS:load1:GAUGE:600:0:100 DS:load5:GAUGE:600:0:100 DS:load15:GAUGE:600:0:100 RRA:AVERAGE:0.5:2:800

# rrdtool create /mrtg/eth1.rrd --start N DS:in:COUNTER:600:0:10000000000 DS:out:COUNTER:600:0:10000000000 RRA:AVERAGE:0.5:2:800

A good explanation of what these various fields mean is here. In short, each “DS:” section defines a “column” (for fellow RDBMS users) in the database. The first one has 3 “columns,” named load1, load5, load15, each of which will contain GAUGE data. The second one contains two COUNTER fields, representing the bytes in/out for interface eth1.

To actually get the data I poll snmpd via this bash script:

#!/bin/bash

rrdupdate /mrtg/load.rrd N:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost laLoad.1`:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost laLoad.2`:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost laLoad.3`

rrdupdate /mrtg/eth1.rrd N:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost ifInOctets.3`:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost ifOutOctets.3`

I have that run every 5 minutes via cron. Then to generate the actual graph, I run this script via cron:

#!/bin/bash

rrdtool graph /var/www/html/graphs/load.png 
        -N 
        -E 
        --start now-30hours 
        --title "Load Averages" 
        --width 300
         --x-grid MINUTE:60:HOUR:2:HOUR:4:0:%H
        --height 200 
        -u 1.0 
        --lower-limit 0
        --vertical-label "Load Avg" 
        --full-size-mode 
-a PNG --title="Load Avg" 
'DEF:load1=/mrtg/load.rrd:load1:AVERAGE' 
'VDEF:load1last=load1,LAST' 
'DEF:load5=/mrtg/load.rrd:load5:AVERAGE' 
'DEF:load15=/mrtg/load.rrd:load15:AVERAGE' 
'AREA:load15#33CC33:15 Min Load Avg ' 
'LINE1:load1#0000ff:1 Min Load Avg ' 
'GPRINT:load1:AVERAGE:"Load1 Avg:%3.2lf"' 
'GPRINT:load1last:Drawn at %Y-%m-%d, %H:%M:strftime' 
#'LINE1:load5#ff00ff:5 Min Load Avg ' 

 
rrdtool graph /var/www/html/graphs/eth1.png 
        -N 
        -E 
        --start now-30hours 
        --title "eth1 traffic" 
        --width 300
         --x-grid MINUTE:60:HOUR:2:HOUR:4:0:%H
        --height 200 
        -u 1000000 
        --lower-limit 0
        --vertical-label "bps" 
        --full-size-mode 
-a PNG --title="eth1 traffic" 
'DEF:eth1in=/mrtg/eth1.rrd:in:AVERAGE' 
'CDEF:eth1inbits=eth1in,8,*' 
'VDEF:eth1last=eth1in,LAST' 
'DEF:eth1out=/mrtg/eth1.rrd:out:AVERAGE' 
'CDEF:eth1outbits=eth1out,8,*' 
'AREA:eth1inbits#33CC33:eth1 in ' 
'LINE1:eth1outbits#0000ff:eth1 out' 
'GPRINT:eth1last:Drawn at %Y-%m-%d, %H:%M:strftime' 

The final graphs look decent, though not very fancy, but I’ll play around with it a bit more:

eth1 graph
eth1 graph
load graph
load graph
Advertisements

Running MRTG cfgmaker across your entire subnet?

I realized recently that I had a bunch of newly-provisioned VMs that weren’t being monitored by MRTG (one of the tools we use to monitor network usage and other fun stats). Rather than manually run cfgmaker against all the new machines, I decided to script my way out of this.

Continue reading “Running MRTG cfgmaker across your entire subnet?”