Verbose Data
Data is reported in this form when either --verbose is used OR if there is at least one
type of data requested that doesn't have a brief form such as any detail data or
ionodes, processes or slabs. Specifying some of the lustre output options with --lustopts
such as B, D and M will also force verbose format.
CPU, collectl -sc
# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER NICE SYS WAIT IRQ SOFT STEAL IDLE INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15
These are the percentage of time the system in running is one of the modes, noting that
these are averaged across all CPUs. While User and Sys modes are self-eplanitory, the others
may not be:
User |
Time spent in User mode, not including time spend in "nice" mode. |
Nice |
Time spent in Nice mode, that is lower priority as adjusted by
the nice command and have the "N" status flag set when examined with "ps". |
Sys |
This is time spent in "pure" system time. |
Wait |
Also known as "iowait", this is the time the CPU was idle during an
outstanding disk I/O request. This is not considered to be part of the total or
system times reported in brief mode. |
Irq |
Time spent processing interrupts and also considered to be part of
the summary system time reported in "brief" mode. |
Soft |
Time spent processing soft interrupts and also considered to be part
of the summary system time reported in "brief" mode. |
Steal |
Time spend in involuntary wait state while the hypervisor was servicing
another virtual processor. |
This next set of fields apply to processes
Proc | Process creations/sec. |
Runq | Number of processes in the run queue. |
Run | Number of processes in the run state. |
Avg1, Avg5, Avg15 | Load average over the last 1,5 and 15 minutes. |
Disks, collectl -sd
# DISK SUMMARY (/sec)
#KBRead RMerged Reads SizeKB KBWrit WMerged Writes SizeKB
KBRead | KB read/sec |
RMerged |
Read requests merged per second when being dequeued.
These statistics are not available in older kernels which
only record disk statistics in /proc/stat. |
Reads | Number of reads/sec |
SizeKB | Average read size in KB |
KBWrite | KB written/sec |
WMerged |
Write requests merged per second when being dequeued. |
Writes | Number of writes/sec |
SizeKB | Average write size in KB |
Inodes/Filesystem, collectl -si
# INODE SUMMARY
# Dentries File Handles Inodes
# Number Unused Alloc % Max Number
40585 39442 576 0.17 38348
DCache |
Number | Number of entries in directory cache |
Unused | Number of unused entries in directory cache |
Handles | Number of allocated file handles |
% Max | Percentage of maximum available file handles |
Inode | Number of used inode handles |
NOTE - as of this writing I'm baffled by the dentry unused field. No matter how
many files and/or directories I create, this number goes up! Sholdn't it go down?
Infiniband, collectl -sx
# INFINIBAND SUMMARY (/sec)
# KBIn PktIn SizeIn KBOut PktOut SizeOut Errors
KBIn | KB received/sec. |
PktIn | Packets received/sec. |
SizeIn | Average incoming packet size in KB |
KBOut | KB transmitted/sec. |
PktOut | Packets transmitted/sec. |
SizeOut | Average outgoing packet size in KB |
Errs | Count of current errors. Since these
are typically infrequent, it is felt that reporting them as a rate would result
in either not seeing them OR round-off hiding their values. |
Lustre
Lustre Client, collectl -sl
There are several formats here controlled by the --lustopts switch. There is
also detail data for these available as well. Specifying -sL results in
data broken out by the file system and --lustopts O further breaks it out by OST.
Also note the average read/write sizes are only reported when --lustopts is not specified.
# LUSTRE CLIENT SUMMARY
# KBRead Reads SizeKB KBWrite Writes SizeKB
KBRead | KB/sec delivered to the client. |
Reads | Reads/sec delivered to the client,
not necessarily from the lustre storage servers. |
SizeKB | Average read size in KB |
KBWrite | KB Writes/sec delievered to the storage servers. |
Writes | Writes/sec delievered to the storage servers. |
SizeKB | Average write size in KB |
# LUSTRE CLIENT SUMMARY: METADATA
# KBRead Reads KBWrite Writes Open Close GAttr SAttr Seek Fsynk DrtHit DrtMis
KBRead | KB/sec delivered to the client. |
Reads | Reads/sec delivered to the client,
not necessarily from the lustre storage servers. |
KBWrite | KB Writes/sec delievered to the storage servers. |
Writes | Writes/sec delievered to the storage servers. |
Open | File opens/sec |
Close | File closes/sec |
GAttr | getattrs/sec |
Seek | seeks/sec |
Fsync | fsyncs/sec |
DrtHit | dirty hits/sec |
DrtMis | dirty misses/sec |
# LUSTRE CLIENT SUMMARY: READAHEAD
# KBRead Reads KBWrite Writes Pend Hits Misses NotCon MisWin FalGrb LckFal Discrd ZFile ZerWin RA2Eof HitMax Wrong
KBRead | KB/sec delivered to the client. |
Reads | Reads/sec delivered to the client,
not necessarily from the lustre storage servers. |
KBWrite | KB Writes/sec delievered to the storage servers. |
Writes | Writes/sec delievered to the storage servers. |
Pend | Pending issued pages |
Hits | prefetch cache hits |
Misses | prefetch cache misses |
NotCon | The current pages read that were not consecutive with the previous ones./td> |
MisWin | Miss inside window. The pages that were expected to be in the
prefetch cache but weren't. They were probably
reclaimed due to memory pressure |
LckFal | Failed grab_cache_pages. Tried to prefetch page but it was locked. |
Discrd | Read but discarded. Prefetched pages (but not read by applicatin)
have been discarded either becuase of memory pressure or lock
revocation. |
ZFile | Zero length file. |
ZerWin | Zero size window. |
RA2Eof | Read ahead to end of file |
HitMax | Hit maximum readahead issue. The read-ahead window has grown to the
maximum specified by max_read_ahead_mb |
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#RdK Rds 1K 2K ... WrtK Wrts 1K 2K ...
This display shows the size of rpc buffer distribution buckets in K-pages. You can find the
page size for you system in the header (collectl --showheader).
RdK | KBs read/sec |
Rds | Reads/sec |
nK | Number of pages of of this size read |
WrtK | KBs written/sec |
Wrts | Writes/sec |
nK | Number of pages of of this size written |
Lustre Meta-Data Server, collectl -sl
As of Lustre 1.6.5, the data reported for the MDS had changed, breaking out the Reint
data into 5 individual buckets which are the last 5 fields described below. For earlier
versions those 5 fields will be replaced by a single one named Reint.
# LUSTRE MDS SUMMARY
#Getattr GttrLck StatFS Sync Gxattr Sxattr Connect Disconn Create Link Setattr Rename Unlink
Getattr | Number of getattr calls, for example lfs osts.
Note that this counter is not incremented as the result of ls - see Gxattr |
GttrLck | These are getattrs that also return a lock on the file |
StatFS | Number of stat calls, for example df or lfs df.
Note that lustre caches data for up to a second so many calls within a second may only show
up as a single statfs |
Sync | Number of sync calls |
Gxattr | Extended attribute get operations, for example getfattr,
getfacl or even ls. Note that the MDS must have been mounted with -o acl
for this counter to be enabled. |
Sxattr | Extended attribute set operations, for example setfattr or setfacl |
Connect | Client mount operations |
Disconn | Client umount operations |
Create | Count of mknod and mkdir operations, also used by NFS servers internally when creating files |
Link | Hard and symbolic links, for example ln |
Setattr | All operations that modify inode attributes including chmod, chown, touch, etc |
Rename | File and directory renames, for example mv |
Unlink | File/directory removals, for example rm or rmdir |
The following display is very similar the the RPC buffers in that the sizes of different size
I/O requests are reported. In this case there are requests sent to the disk driver.
Note that this report is only available for HP's SFS.
# LUSTRE DISK BLOCK LEVEL SUMMARY
#Rds RdK 0.5K 1K ... Wrts WrtK 0.5K 1K ...
Rds | Reads/sec |
RdK | KBs read/sec |
nK | Number of blocks of of this size read |
Wrts | Writes/sec |
WrtK | KBs written/sec |
nK | Number of blocks of of this size written |
Lustre Object Storage Server, collectl -sl
# LUSTRE OST SUMMARY
# KBRead Reads SizeKB KBWrite Writes SizeKB
KBRead | KB/sec read |
Reads | Reads/sec |
SizeKB | Average read size in KB |
KBWrite | KB/sec written |
Writes | Writes/sec
|
SizeKB | Average write size in KB |
Lustre Object Storage Server, collectl -sl --lustopts B
As with client data, when you only get read/write average sizes when
--lustopt is not specified.
# LUSTRE OST SUMMARY
#<--------reads-----------|----writes-----------------
#RdK Rds 1K 2K ... WrtK Wrts 1K 2K ....
RdK | KBs read/sec |
Rds | Reads/sec |
nK | Number of pages of of this size read |
WrtK | KBs written/sec |
Wrts | Writes/sec |
nK | Number of pages of of this size written |
Lustre Object Storage Server, collectl -sl --lustopts D
# LUSTRE DISK BLOCK LEVEL SUMMARY
#RdK Rds 0.5K 1K ... WrtK Wrts 0.5K 1K ...
RdK | KBs read/sec |
Rds | Reads/sec |
nK | Number of blocks of of this size read |
WrtK | KBs written/sec |
Wrts | Writes/sec |
nK | Number of blocks of of this size written |
Memory, collectl -sm
# MEMORY STATISTICS
#<------------------------Physical Memory-----------------------><-----------Swap----------><-Inactive->
# TOTAL USED FREE BUFF CACHED SLAB MAPPED COMMIT TOTAL USED FREE TOTAL IN OUT
Total |
Total physical memory |
Used |
Used physical memory. This does not include memory used by the kernel itself. |
Commit |
Accorting to RedHat: "An estimate of how much RAM you would need to make a 99.99% guarantee
that there never is OOM (out of memory) for this workload." |
Swap Total |
Total Swap |
Swap Used |
Used Swap |
Swap Free |
Free Swap |
Inactive |
Inactive pages. On ealier kernels this number is the sum of the clean, dirty
and laundry pages. |
Pages/Sec In |
Total number of pages read by block devices |
Pages/Sec Out |
Total number of pages written by block devices |
Network, collectl -sn
The entries for error counts are actually the total of several types of errors.
To get individual error counts, you must report details on individual
interfaces in plot format by specifying -P. Transmission errors are categorized
by errors, dropped, fifo, collisions and carrier.
Receive errors are broken out for errors, dropped, fifo and fragments.
# NETWORK SUMMARY (/sec)
# KBIn PktIn SizeIn MultI CmpI ErrIn KBOut PktOut SizeO CmpO ErrOut
KBIn |
Incoming KB/sec |
PktIn |
Incoming packets/sec |
SizeI |
Average incoming packet size in bytes |
MultI |
Incoming multicast packets/sec |
CmpI |
Incoming compressed packets/sec |
ErrIn |
Incoming errors/sec |
KBOut |
Outgoing KB/sec |
PktOut |
Outgoing packets/sec |
SizeO |
Average outgoing packet size in bytes |
CmpO |
Outgoing compressed packets/sec |
ErrOut |
Outgoing errors/sec |
NFS, collectl -sf
These statistics will be reported for V3 servers by default but you can
choose a different version and/or client data via --nfsopts. They correspond
to the net, rpc and protocol specific sections of the nfsstat utility.
# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS V3--->
#PKTS UDP TCP TCPCONN CALLS BADAUTH BADCLNT READ WRITE
Pkts | Total network packets, which is the sum of UDP and TCP |
UDP | Number of UDP packets/sec |
TCP | Number of TCP packets/sec |
TCPConn | Number of TCP connections/sec |
Calls | Number of RPC calls/sec |
BadAuth | Number of authentication failures/sec |
BadClnt | Number of unknown clients/sec |
Read | Number of reads/sec |
Write | Number of writes/sec |
NFS, collectl -sf -nfsopts C
The data reported for clients is slightly different, specifically the
retrans and authref fields.
# NFS CLIENT (/sec)
#<----------RPC---------><---NFS V3--->
#CALLS RETRANS AUTHREF READ WRITE
Calls | Number of RPC calls/sec |
Retrans | Retransmitted calls |
Authref | Authentication failed |
Read | Number of reads/sec |
Write | Number of writes/sec |
Slabs, collectl -sy
As of the 2.6.22 kernel, there is a new slab allocator, called SLUB, and since
there is not a 1:1 mapping between what it reports and the older slab allocator,
the format of this listing will depend on which allocator is being used. The following
format is for the older allocator.
# SLAB SUMMARY
#<------------Objects------------><--------Slab Allocation-------><--Caches--->
# InUse Bytes Alloc Bytes InUse Bytes Total Bytes InUse Total
Objects |
InUse |
Total number of objects that are currently in use. |
Bytes |
Total size of all the objects in use. |
Alloc |
Total number of objects that have been allocated but not necessarily in use. |
Bytes |
Total size of all the allocated objects whether in use or not. |
Slab Allocation |
InUse |
Number of slabs that have at least one active object in them. |
Bytes |
Total size of all the slabs. |
Total |
Total number of slabs that have been allocated whether in use or not. |
Bytes |
Total size of all the slabs that have been allocted whether in use or not. |
Caches |
InUse |
Not all caches are actully in use. This included only those with non-zero
counts. |
Total |
This is the count of all caches, whether currently in use or not. |
This is format for the new slub allocator
# SLAB SUMMARY
#<---Objects---><-Slabs-><-----memory----->
# In Use Avail Number Used Total
One should note that this report summarizes those slabs being monitored. In general
this represents all slabs, but if filering is being used these numbers will only
apply to those slabs that have matched the filter.
Objects |
InUse |
The total number of objects that have been allocated to processes. |
Avail |
The total number of objects that are available in the currently allocated slabs.
This includes those that have already been allocated toprocesses. |
Slabs |
Number |
This is the number of individual slabs that have been allocated and
taking physical memory. |
Memory |
Used |
Used memory corresponds to those objects that have been allocated to
processes. |
Total |
Total physical memory allocated to processes. When there is no filtering
in effect, this number will be equal to the Slabs field reported by -sm. |
Sockets, collectl -ss
# SOCKET STATISTICS
# <-------------Tcp-------------> Udp Raw <---Frag-->
#Used Inuse Orphan Tw Alloc Mem Inuse Inuse Inuse Mem
Used | Total number if socket allocated which can include additional types such as domain. |
Tcp |
Inuse | Number of TCP connections in use |
Orphan | Number of TCP orphaned connections |
Tw | Number of connections in TIME_WAIT |
Alloc | TCP sockets allocated |
Mem | |
Udp |
Inuse | Number of UCP connections in use |
Raw |
Inuse | Number of RAW connections in use |
Frag |
Inuse | |
Mem | |
TCP, collectl -st
# TCP SUMMARY (/sec)
# PureAcks HPAcks Loss FTrans
PureAcks | ACKs/sec that only contain acks (ie no data). |
HPAcks | Fast-path acks/sec. |
Loss | Packets/sec TCP thinks have been lost coming in. |
FTrans | Fast retransmissions/sec. |