DaemonCommands = -f /var/log/collectl -r00:01,7 -m -F60 -s+CEYZ
./configure make make installThat's all it takes. However, your system must support ipmi and one way to tell is if the command dmidecode | grep IPMI produces any output. If not, your system does not support ipmi and even if you were able to install impitool, you won't be able to use it.
The next step is to start the ipmi driver, and this is generally done via the command service ipmi start on a RedHat system or something line /etc/init.d/impi start on others. On some systems such as HP blades, you may need to install a custome ipmi driver such as hp-OpenIPMI and start that instead of the standard driver.
At this point you should be able to execute the command "ipmitool sdr and see a all your sensor data or the commands ipmitool sdr type fan and ipmitool sdr type temp to just see fan and temperature data:
[root@bl460-63 ipmitool-1.8.9]# ipmitool sdr UID Light | 0 unspecified | ok Int. Health LED | 0 unspecified | ok VRM 1 | 0 unspecified | cr VRM 2 | 0 unspecified | cr Temp 1 | 47 degrees C | ok Temp 2 | 34 degrees C | ok Temp 3 | 30 degrees C | ok Temp 4 | 30 degrees C | ok Temp 5 | 31 degrees C | ok Temp 6 | 30 degrees C | ok Temp 7 | 30 degrees C | ok Temp 8 | 66 degrees C | ok Temp 9 | 20 degrees C | ok Virtual Fan | 37.24 unspecifi | nc Enclosure Status | 0 unspecified | nc
You can control the way ipmi data is displayed in playback mode using --envopts and one of 3 switches that allow you to only report fan or temperature data and if you are reporting both, which is the default, you can request the 2 types of data be displayed on separate lines. This latter option can be useful if you have a lot of devices on which to report.
The following is an example of time-stamped output on an HP dl380-g5, first without any options
collectl.pl -sE -i::1 -oT # ENVIRONMENTAL STATISTICS # Fan1 Fan2 Fan3 Fan4 Fan5 Fan6 Fan Temp1 Temp2 Temp3 Temp4 Temp5 Temp6 Temp7 07:00:58 45.080 45.080 41.944 36.064 36.064 36.064 0 47 22 31 31 52 31 31 07:00:59 45.080 45.080 41.944 36.064 36.064 36.064 0 47 22 31 31 52 31 31 07:01:00 45.080 45.080 41.944 36.064 36.064 36.064 0 47 22 31 31 52 31 31
collectl.pl -sE -i::1 -oT --envopts M ### RECORD 1 >>> opteron167 <<< (1218022891.002) (Wed Aug 6 07:41:31 2008) ### # ENVIRONMENTAL STATISTICS # CFAN1 CFAN2 CFAN3 CFAN4 CFAN5 CFAN6 CFAN7 CFAN8 CFAN9 CFAN10 SFAN1 SFAN2 6200 6000 6200 6200 6200 5800 6200 6000 6200 6000 6000 6200 # CTEMP0 CTEMP1 STEMP 51 48 29 ### RECORD 2 >>> opteron167 <<< (1218022892.002) (Wed Aug 6 07:41:32 2008) ### # ENVIRONMENTAL STATISTICS # CFAN1 CFAN2 CFAN3 CFAN4 CFAN5 CFAN6 CFAN7 CFAN8 CFAN9 CFAN10 SFAN1 SFAN2 6200 6000 6200 6200 6200 5800 6200 6000 6200 6000 6000 6200 # CTEMP0 CTEMP1 STEMP 51 48 29
Fan 1 Fans CPU FAN1 SYS FAN1 Fan1A (CPU) FAN CPU0 FAN MOD 1A RPM Fan RedundancyOn the one hand, collectl could simply report the exact names as they are reported, but the challenge of trying to format them in such a way as to provide a compact display are impossible. Given that the collectl standard reporting format is a single data header line, the notion of multiple-line headers is not an option. While it is tempting to simply determine the widest device name and use that for a header width, for systems that report over a dozen devices you couldn't fit them on the same line and that's only for systems that have been tested.
After looking at all these different names and formats, one common theme did emerge. All devices appear to have optional numbers (I didn't see any with just letters) and those letters have options letters. Furthermore, there seems to be some sort of optional type associated associated with many as well. This led to the idea of a standard naming for these devices as follows:
[type]Fan|Temp[devicenumber[deviceletter]]
in which the type field would be limited to a single character. Applying this scheme to the examples above leads to the following name mapping:
Fan 1 Fan1 Fans Fan CPU FAN1 CFAN1 SYS FAN1 SFAN1 Fan1A (CPU) CFan1A FAN CPU0 CFAN0 FAN MOD 1A RPM MFAN1A Fan Redundancy RFanThis is admittedly not perfect but seems like a reasonable compromise and since collectl will report the device names in the same order returned by ipmitool it is not all that difficult to figure out how collectl chose to map them.
After examing many different types of device name formats, it was determined that most tended to follow a patter of
prefix type instanceNumber suffix
Where things get a little crazy is that sometimes the actual instance number can be part of the prefix OR sometimes the instance contains a letter.
All that said, collectl breaks a device name in the these components, assuming a numeric instance. It then applies the minimal set of tests/modifications, note there are examples of all these cases in the sample names shown earlier:
Fan CPU0 Tach,3480 Prefix: Name: Fan Instance: Suffix: CPU0 Tach Fan1A (CPU),EAh,ok,29.3,Performance Met Prefix: Name: Fan Instance: 1 Suffix: A (CPU) FAN MOD 1A RPM,5775,RPM,ok Prefix: Name: FAN Instance: Suffix: MOD 1A RPM
To use this feature one includes a file containing the directives and points collectl to it using --envrules. The file itself contains lines of the following form noting that spaces and comments (lines preceeded with a #) are permitted:
[pre] /pattern1/replace1/ /pattern2/replace2/ ... /patternN/replaceN/ [post] /pattern1/replace1/ /pattern2/replace2/ ... /patternN/replaceN/If you know perl (and you really should if you use this), collectl builds a perl pattern subsitution command out of the pattern and replace strings. So looking at the string
FAN MOD 1A RPMand the processing rules described in the previous section, the MOD suffix will be prepended to FAN and the first letter used to name the device MFAN, losing the instance information with is 1A.
There are at least 3 options here. The first is to simply remove MOD from each name which we can do with the rule:
/ MOD//which will result in the instance names being picked up correctly because they will now immediately follow FAN. In fact, if you include --envdebug along with your rules you'll see the results of the replacement:
FAN MOD 1A RPM,5775,RPM,ok Pre-Remapped 'FAN MOD 1A RPM' to 'FAN 1A RPM' Prefix: Name: FAN Instance: 1 Suffix: A RPM
/(.*) MOD (.*)/MOD $1$2/and results in the following parsing:
FAN MOD 1A RPM,5775,RPM,ok Pre-Remapped 'FAN MOD 1A RPM' to 'MOD FAN 1A RPM' Prefix: MOD Name: FAN Instance: 1 Suffix: A RPMUnfortunately in order to make perl iterpret the $1$2 symbols an eval is required which generates a little extra overhead and while not horrible an even better solution is the third option which doesn't use any special $ symbols:
/FAN MOD/MOD FAN/which produces exactly the same results as the previous example except without the eval command.There is in fact at least one other mechanism for those that are not all that familiar with perl and is only being included for completeness, and that is to simply hardcode the replacement of each device with the desired output. In other words
/FAN MOD 1A RPM/MOD FAN1 A/ /FAN MOD 2A RPM/MOD FAN2 A/ /FAN MOD 3A RPM/MOD FAN3 A/ etcwill produce strings that can also be properly parsed without involved $ variables but this means you need to specify each unique device name to remap and it will also result in all pattern matching statements to be executed for each device which will also result in slightly more overhead.
Restrictions
Some systems report what appears to be device codes in the data field and the data in the 4th field and I don't know why. For now, when this occurs report the 4th column as the data instead. If this breaks other things it will have to be removed and invalid data reported for those who do not report it in column 2.