2 Host

The host metrics provide description, collection statistics, data source, multiple thresholds (where applicable), and user action information for each metric.

2.1 Aggregate Resource Usage Statistics (By Project)

This metric provides data on aggregate resource usage on a per project basis.

This metric is available only on Solaris version 9 and later.

The following table lists the metrics and their descriptions.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The data source for these metrics is Solaris CIM Object Manager.

Table 2-1 Aggregate Resource Usage Statistics (By Project)

Metric Description

Cumulative CPU Wait Time (Seconds)

Cumulative number of seconds that this process has spent Waiting for CPU over its lifetime

Cumulative Data Page Fault Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in Data Page Faults over its lifetime

Cumulative Major Page Faults

Cumulative number of Major Page Faults engendered by the process over its lifetime

Cumulative Minor Page Faults

Cumulative number of Minor Page Faults engendered by the process over its lifetime

Cumulative Number Character IO (bytes) Read and Written

Cumulative number of character I/O bytes Read and Written by the process over its lifetime

Cumulative Number of Blocks Read

Cumulative number of blocks Read by the process over its lifetime

Cumulative Number of Blocks Written

Cumulative number of blocks Written by the process over its lifetime

Cumulative Number of Involuntary Context Switches

Cumulative number of Involuntary Context Switches made by the process over its lifetime

Cumulative Number of Messages Received

Cumulative number of Messages Received by the process over its lifetime

Cumulative Number of Messages Sent

Cumulative number of Messages Sent by the process over its lifetime

Cumulative Number of Signals Received

Cumulative number of Signals taken by the process over its lifetime

Cumulative Number of System Calls Made

Cumulative number of system calls made by the process over its lifetime

Cumulative Number of Voluntary Context Switches

Cumulative number of Voluntary Context Switches made by the process over its lifetime

Cumulative Project Lock-Wait Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping on User Lock Waits over its lifetime

Cumulative Project Other Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in all other ways over its lifetime

Cumulative Stop Time (Seconds)

Cumulative number of seconds that this process has spent Stopped over its lifetime

Cumulative Swap Operations

Cumulative number of swap operations engendered by the process over its lifetime

Cumulative System Mode Time (Seconds)

Cumulative number of seconds that this process has spent in System mode over its lifetime

Cumulative System Page Fault Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in System Page Faults over its lifetime

Cumulative System Trap Time (Seconds)

Cumulative number of seconds that this process has spent in System Traps over its lifetime

Cumulative Text Page Fault Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in Text Page Faults over its lifetime

Cumulative User Mode Time (Seconds)

Cumulative number of seconds that this process has spent in User mode over its lifetime

Number of Processes Owned by Project

Number of processes owned by the project measured in the aggregate

Project CPU Time (%)

Percent CPU time used by the process

Project Process Memory Size (%)

Ratio of the process resident set size to physical memory

Project's Total Process Heap Size (KiloBytes)

Total number of KiloBytes of memory consumed by the process heap at the time that it is sampled

Project's Total Process Resident Set Size (KiloBytes)

Resident set size of the process in kilobyte

Project's Total Process Virtual Memory Size (KiloBytes)

Resident set size of the process in kilobyte

Total Number of Threads in Project's Processes

Number of threads active in the current Process


2.2 Aggregate Resource Usage Statistics (By User)

This metric provides data on aggregate resource usage on a per user basis.

This metric is available only on Solaris version 9 and later.

The following table lists the metrics and their descriptions.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The data source for these metrics is Solaris CIM Object Manager.

Table 2-2 Aggregate Resource Usage Statistics (By User)

Metric Description

Cumulative CPU Wait Time (Seconds)

Cumulative number of seconds that this process has spent Waiting for CPU over its lifetime

Cumulative Data Page Fault Sleep Time (Seconds)

Cumulative number of seconds that this process has spent Waiting for CPU over its lifetime

Cumulative Major Page Faults

Cumulative number of Major Page Faults engendered by the process over its lifetime

Cumulative Minor Page Faults

Cumulative number of Minor Page Faults engendered by the process over its lifetime

Cumulative Number Character IO (Bytes) Read and Written

Cumulative number of character I/O bytes Read and Written by the process over its lifetime

Cumulative Number of Blocks Read

Cumulative number of blocks Read by the process over its lifetime

Cumulative Number of Blocks Written

Cumulative number of blocks Written by the process over its lifetime

Cumulative Number of Involuntary Context Switches

Cumulative number of Involuntary Context Switches made by the process over its lifetime

Cumulative Number of Messages Received

Cumulative number of Messages Received by the process over its lifetime

Cumulative Number of Messages Sent

Cumulative number of Messages Sent by the process over its lifetime

Cumulative Number of Signals Received

Cumulative number of Signals taken by the process over its lifetime

Cumulative Number of System Calls Made

Cumulative number of system calls made by the process over its lifetime

Cumulative Number of Voluntary Context Switches

Cumulative number of Voluntary Context Switches made by the process over its lifetime

Cumulative Stop Time (Seconds)

Cumulative number of seconds that this process has spent Stopped over its lifetime

Cumulative Swap Operations

Cumulative number of Swap Operations engendered by the process over its lifetime

Cumulative System Mode Time (Seconds)

Cumulative number of seconds that this process has spent in System mode over its lifetime

Cumulative System Page Fault Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in System Page Faults over its lifetime

Cumulative System Trap Time (Seconds)

Cumulative number of seconds that this process has spent in System Traps over its lifetime

Cumulative Text Page Fault Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in Text Page Faults over its lifetime

Cumulative User Lock-Wait Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping on User Lock Waits over its lifetime

Cumulative User Mode Time (Seconds)

Cumulative number of seconds that this process has spent in User mode over its lifetime

Cumulative User Other Sleep Time (Seconds)

Cumulative number of seconds that this process has spent sleeping in all other ways over its lifetime

Number of Processes Owned by User

Number of processes owned by the user measured in the aggregate

Total Number of Threads in User's Processes

Number of processes owned by the user measured in the aggregate

User CPU Time (%)

Percent CPU time used by the process

User Process Memory Size (%)

Ratio of the process resident set size to physical memory

User's Total Process Heap Size (KiloBytes)

Total number of kilobytes of memory consumed by the process heap at the time that it is sampled

User's Total Process Resident Set Size (KiloBytes)

Resident set size of the process in kilobytes

User's Total Process Virtual Memory Size (KiloBytes)

Size of the process virtual address space in kilobytes


2.3 Buffer Activity

The Buffer Activity metric provides information about OS memory buffer usage. This metric reports buffer activity for transfers, accesses, and cache (kernel block buffer cache) hit ratios per second.

The data sources for this metric category include the following:

Host Data Source
Solaris sar command
HP sar command
Linux not available
HP Tru64 table() system call
IBM AIX sar command
Windows not available

The following table lists the metrics and their descriptions.

Table 2-3 Buffer Activity Metrics

Metric Description

Buffer Cache Read Hit Ratio (%)

Number of reads from block devices to buffer cache as a percentage of all buffer reads

Buffer Cache Reads (per second)

Number of reads performed on the buffer cache per second. Note: This metric is not available on HP Tru64.

Buffer Cache Write Hit Ratio (%)

Number of writes from block devices to buffer cache as a percentage of all buffer writes

Buffer Cache Writes (per second)

Number of writes performed on the buffer cache per second. Note: This metric is not available on HP Tru64.

Physical I/O Reads (per second)

Number of reads per second from character devices using physical I/O mechanisms

Physical I/O Writes (per second)

Number of writes per second from character devices using physical I/O mechanisms

Physical Reads (per second)

Number of reads performed per second from block devices to the system buffer cache

Physical Writes (per second)

Number of physical writes from block devices to the system buffer cache


2.4 CPU Usage

The CPU Usage metric provides information about the percentage of time the CPU was in various states, for example, idle state and wait state. The metric also provides information about the percentage of CPU time spent in user and system mode. All data is per-CPU in a multi-CPU system.

On HP Tru64, this information is available as the cumulative total for all the CPUs and not for each CPU which is monitored in the Load metric. Hence, this metric is not available on HP Tru64.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The data sources for this metric category include the following:

Host Data Source
Solaris kernel statistics (class cpu_stat)
HP pstat_getprocessor() system call
Linux /proc/stat
HP Tru64 not available
IBM AIX oracle_kstat() system call
Windows performance data counters

The following table lists the metrics and their descriptions.

Table 2-4 CPU Usage Metrics

Metric Description

CPU Idle Time (%)

Represents the percentage of time that the CPU was idle and the system did not have an outstanding disk I/O request. This metric checks the percentage of processor time in idle mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system).

CPU Interrupt Time (%)

See Section 2.4.1, "CPU Interrupt Time (%)" Note: This metric is available only on Windows.

CPU System Time (%)

Represents the percentage of time that the CPU is running in system mode (kernel). This metric checks the percentage of processor time in system mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system).

CPU User Time (%)

Represents the portion of processor time running in user mode. This metric checks the percentage of processor time in user mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system).

CPU Wait Time (%)

Represents the percentage of time that the CPU was idle during which the system had an outstanding disk I/O request. This metric checks the percentage of processor time in wait mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system). Note: This metric is not available on Solaris and HP Tru64.


2.4.1 CPU Interrupt Time (%)

Represents the percentage of time that the CPU receives and services hardware interruptions during representative intervals. This metric checks the percentage of processor time in interrupt mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system).

This metric is available only on Windows.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "CPU Number" object.

If warning or critical threshold values are currently set for any "CPU Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "CPU Number" object, use the Edit Thresholds page. See the Editing Thresholds topic in the Enterprise Manager online help for information on accessing the Edit Thresholds page.

Data Source

The data sources for this metric are Performance Data counters.

2.5 CRS Alert Log

This metric collects certain Cluster Ready Services (CRS) error messages and issues either WARNING or CRITICAL alerts based on the error codes.

2.5.1 Alert Log Name

Shows the name and full path of the Cluster Ready Services (CRS) alert log.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 5 Minutes

2.5.2 Clusterware Service Alert Log Error

Collects CRS-1012, CRS-1201, CRS-1202 and CRS-1401, CRS-1402, CRS-1602 and CRS-1603 messages in the Cluster Ready Services (CRS) alert log at the host level.

CRS-1201, CRS-1401, CRS-1012 alert log messages trigger warning alerts.

CRS-1202, CRS-1402, CRS-1602 and CRS-1603 alert log messages trigger critical alerts.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-5 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

MATCH

CRS-(1201|1401|1012)

CRS-(1202|1402|1602|1603)

1*

%clusterwareErrStack% See %alertLogName% for details.


* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.

If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.

2.5.3 CRS Resource Alert Log Error

Collects CRS-1203, CRS-1205 and CRS-1206 messages in the Cluster Ready Services (CRS) alert log at the host level and issues 'CRS Resource Alert Log Error' alerts at critical level.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-6 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

MATCH

Not Defined

CRS-120(3|5|6)

1*

%resourceErrStack% See %alertLogName% for details.


* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.

If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.

2.5.4 OCR Alert Log Error

Collects CRS-1009 messages in the Cluster Ready Services (CRS) alert log at the host level and issues 'OCR Alert Log Error' type alerts. OCR refers to Oracle Cluster Registry.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-7 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

MATCH

Not Defined

CRS-1009

1*

%ocrErrStack% See %alertLogName% for details.


* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.

If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.

2.6 CRS Nodeapp Status

This metric monitors the status of the following: Node Applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS).

2.6.1 Nodeapp Status

Monitors the status of the following: Node Applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS). A critical alert is raised for the nodeapp if its status is 'OFFLINE NOT RESTARTING'. A warning alert is raised for the nodeapp if its status is either 'UNKNOWN or OFFLINE'.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-8 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

MATCH

UNKNOWN|OFFLINE

OFFLINE NOT RESTARTING

1

CRS resource %nodeapps% is %status%


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Nodeapp" object.

If warning or critical threshold values are currently set for any "Nodeapp" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Nodeapp" object, use the Edit Thresholds page.

User Action

Refer to the Real Application Clusters Administration and Deployment Guide for Node Applications startup and troubleshooting information.

2.7 CRS Virtual IP Relocation Status

This metric monitors whether there is a Virtual Internet Protocol (IP) relocation taking place. When a Virtual IP is relocated from the host (node) on which it was originally configured, a critical alert is generated.

2.7.1 Current Node

Shows the current host (node) on which the Virtual Internet Protocol (IP) is configured.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 5 Minutes

2.7.2 Virtual IP Relocated

Shows whether the Virtual Internet Protocol (IP) has relocated from the host (node) where it was originally configured. The value is TRUE if relocation happened. Otherwise it is FALSE. When the value is TRUE, a critical alert is raised.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-9 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

=

Not Defined

TRUE

1

CRS resource %vip% was relocated to %current_node%


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Virtual IP Name" object.

If warning or critical threshold values are currently set for any "Virtual IP Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Virtual IP Name" object, use the Edit Thresholds page.

2.8 Disk Activity

The Disk Activity metric monitors the hard disk activity on the target being monitored. For each device on the system, this metric provides information about access to the device. This information includes: device name, disk utilization, write statistics, and read statistics for the device.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The data sources for this metric category include the following:

Host Data Source
Solaris kernel statistics (class kstat_io)
HP pstat_getdisk system call
Linux iostat command
HP Tru64 table() system call
IBM AIX oracle_kstat() system call
Windows performance data counters

The following table lists the metrics and their descriptions.

Table 2-10 Disk Activity Metrics

Metric Description

Average Disk I/O Service Time (ms)

See Section 2.8.1, "Average Disk I/O Service Time (ms)"

Average Disk I/O Wait Time (ms)

See Section 2.8.2, "Average Disk I/O Wait Time (ms)". Note: This metric is not available on Linux.

Average Outstanding Disk I/O Requests

Represents the average number of commands waiting for service (queue length). Note: This metric is not available on Linux.

Average Run Time (ms)

Represents the average time spent by the command on the active queue waiting for its execution to be completed. Note: This metric is not available on Linux.

Disk Block Writes (per second)

Represents the number of blocks (512 bytes) written per second. Note: This metric is not available on HP.

Disk Block Reads (per second)

Represents the number of blocks (512 bytes) read per second. Note: On HPUNIX, this metric is named Disk Blocks Transferred (per second).

Disk Device Busy (%)

See Section 2.8.3, "Disk Device Busy (%)". Note: On HPUNIX, this metric is named Device Busy (%).

Disk Reads (per second)

Represents the disk reads per second for the specified disk device. Note: This metric is not available on HP.

Disk Writes (per second)

Represents the disk writes per second for the specified disk device. Note: This metric is not available on HP.


2.8.1 Average Disk I/O Service Time (ms)

Represents the sum of average wait time and average run time.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-11 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

>

Not Defined

Not Defined

6

Average service time for disk %keyvalue% is %value% ms, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Disk Device" object.

If warning or critical threshold values are currently set for any "Disk Device" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Disk Device" object, use the Edit Thresholds page.

User Action

This number should be low. A high number can indicate a disk that is slow due to excessive load or hardware issues. See also the CPU in IO-Wait (%) metric.

2.8.2 Average Disk I/O Wait Time (ms)

Represents the average time spent by the command waiting on the queue for getting executed.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Disk Device" object.

If warning or critical threshold values are currently set for any "Disk Device" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Disk Device" object, use the Edit Thresholds page.

User Action

A high figure indicates a slow disk. Use the OS iostat -xn command to check wait time and service time for local disks and NFS mounted file systems. See also the CPU in IO-Wait (%) metric.

2.8.3 Disk Device Busy (%)

Represents the amount of disk space utilization as a percentage of capacity.

Note: On HPUNIX, this metric is named Device Busy (%).

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-12 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

>

80

95

6

Disk Device %keyValue% is %value%%% busy.


2.9 Disk Device Errors

The Disk Device Errors metric provides the number of errors on the disk device.

These metrics are available only on Solaris.

Note:

For all target versions, the collection frequency for each metric is every 72 hours.

The data source for these metrics is Solaris iostat -e command.

Table 2-13 Disk Device Errors Metrics

Metric Description

Hard Errors

Represents the error count of hard errors encountered while accessing the disk. Hard errors are considered serious and may be traced to misconfigured or bad disk devices.

Soft Errors

Represents the error count of soft errors encountered while accessing the disk. Soft errors are synonymous to warnings.

Total

Represents the sum of all errors on the particular device.

Transport Errors

Represents the error count of network errors encountered. This generally indicates a problem with the network layer


2.10 Fans

The Fans metric monitors the status of various fans present in the system.

This metric is available only on Dell Poweredge Linux Systems.

2.10.1 Fan Status

Represents the status of the fan.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-14 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of Fan at device %FanIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Fan Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Fan Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Fan Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: coolingDeviceStatus (1.3.6.1.4.1.674.10892.1.700.12.1.5)

2.10.2 Location

Provides a description of the location of the fan. Example values are "CPU Fan", "PCI Fan", and "Memory Fan".

This metric is available only on Dell Poweredge Linux Systems.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

SNMP MIB object: coolingDeviceLocationName (1.3.6.1.4.1.674.10892.1.700.12.1.8)

2.11 File Access System Calls

The File Access System Calls metric provides information about the usage of file access system calls.

This metric is available on Solaris, HP, and IBM AIX.

2.11.1 Blocks Read by Directory Search Routine (per second)

Represents the number of file system blocks read per second performing direct lookup.

Data Source

The data sources for this metric include the following:

Host Data Source
Solaris sar command
HP sar command
IBM AIX sar command

The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of lookuppn() calls made over this five-second period divided by five.

2.11.2 iget() Calls (per second)

Represents the number of system iget() calls made per second. iget is a file access system routine.

Data Source

The data sources for this metric include the following:

Host Data Source
Solaris kernel memory structure (class cpu_vminfo
HP sar command
IBM AIX kernel memory structure (class cpu_vminfo

User Action

This data is obtained using the OS sar command, which is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of iget() calls made over this five-second period divided by five.

2.11.3 lookuppn() Calls (per second)

Represents the number of file system lookuppn() (pathname translation) calls made per second.

Data Source

The data sources for this metric include the following:

Host Data Source
Solaris sar command
HP sar command
IBM AIX sar command

The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of lookuppn() calls made over this five-second period divided by five.

2.12 File and Directory Monitoring

The File and Directory Monitoring metric monitors various attributes of specific files and directories. Setting of key value specific thresholds triggers the monitoring of files or directories referred to in the given key value. The operator must specify key value specific thresholds to monitor any file or directory.

The data sources for this metric include the following:

Host Data Source
Solaris perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points
HP perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points
Linux perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points
HP Tru64 not available
IBM AIX perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points
Windows not available

2.12.1 File or Directory Attribute Not Found

Reports issues encountered in fetching the attributes of the file or directory. Errors encountered in monitoring the files and directories specified by the key value based thresholds are reported.

Note: This metric is not available on IBM AIX.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-15 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

!=

Not Defined

0

1

%file_attribute_not_found% .


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.

If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.

2.12.2 File or Directory Permissions

Fetches the octal value of file permissions on the different variations of UNIX operating systems including Linux. Setting a key value specific warning or critical threshold value against this metric would result in the monitoring of a critical file or directory. For example, to monitor the file permissions for file name /etc/passwd, you should set a threshold for /etc/passwd.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-16 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

!=

Not Defined

Not Defined

1

Current permissions for %file_name% are %file_permissions%, different from warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.

If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.

2.12.3 File or Directory Size (MB)

Fetches the current size of the given file or directory in megabytes. Setting a key value specific warning or critical threshold value against this metric would result in monitoring of a critical file or directory. For example, to monitor the file permissions for directory /absolute_directory_path, you should set a threshold for /absolute_directory_path.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-17 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

>

Not Defined

Not Defined

1

Size of %file_name% is %file_size% MB, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.

If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.

2.12.4 File or Directory Size Change Rate (KB/minute)

Provides the value for the rate at which the file�s size is changing. Setting a key value specific warning or critical threshold value against this metric would result in monitoring of the critical file or directory. For example, to monitor the file change rate for the file name /absolute_file_path, the operator should set a threshold for /absolute_file_path.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-18 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

>

Not Defined

Not Defined

1

%file_name% is growing at the rate of %file_sizechangerate% (KB/hour), crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.

If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.

2.13 Filesystems

The Filesystems metrics provide information about local file systems on the computer.

2.13.1 Filesystem

Represents the name of the disk device resource.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

The data sources for this metric include the following:

Host Data Source
Solaris /etc/mnttab file entries
HP bdf command
Linux df command
HP Tru64 df command
IBM AIX /etc/mnttab file entries
Windows not available

2.13.2 Filesystem Size (MB)

Represents the total space (in megabytes) allocated in the file system.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

The data sources for this metric include the following:

Host Data Source
Solaris vminfo system
HP bdf command
Linux df command
HP Tru64 df command
IBM AIX stavfs() system call
Windows not available

2.13.3 Filesystem Space Available (%)

Represents the percentage of free space available in the file system.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-19 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every 24 Samples

<

20

5

1

Filesystem %keyValue% has %value%%% available space, fallen below warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Mount Point" object.

If warning or critical threshold values are currently set for any "Mount Point" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Mount Point" object, use the Edit Thresholds page.

Data Source

The data sources for this metric include the following:

Host Data Source
Solaris stavfs() system call
HP bdf command
Linux df command
HP Tru64 df command
IBM AIX stavfs() system call
Windows Windows API

User Action

Use the OS du -k command to check which directories are taking up the most space (du -k|sort -rn).

2.13.4 Filesystem Utilization (MB)

Represents the total space, expressed in megabytes, allocated in the file system.

This metric is available only on Windows.

Data Source

The data source for this metric is GetDiskFreeSpaceEx.

2.14 Inventory

The Inventory metric is used for periodic collection of host configuration information. By default, host configuration is collected every 24 hours.

2.15 Kernel Memory

The Kernel Memory metric provides information on kernel memory allocation (KMA) activities.

This metric is available only on Solaris. The data source is the sar command. The data is obtained by sampling system counters once in a five-second interval.

The following table lists the metrics and their descriptions.

Table 2-20 Kernel Memory Metrics

Metric Description

Failed Requests for Large Kernel Memory

Number of requests for large memory that failed, that is, requests that were not satisfied

Failed Requests for Oversize Kernel Memory

Number of oversized requests made that could not be satisfied. Oversized memory requests are allocated dynamically so there is no pool for such requests

Failed Requests for Small Kernel Memory

Number of requests for small memory that failed, that is, requests that were not satisfied

KMA Available for Large Memory Requests (Bytes)

Amount of memory, in bytes, the kernel memory allocation (KMA) has for the large pool; the pool used for allocating and reserving large memory requests.

KMA for Oversize Memory Requests (Bytes)

Amount of memory allocated for oversized memory requests

KMA for Small Memory Requests

Amount of memory, in bytes, the Kernel Memory Allocation has for the small pool; the pool used for allocating and reserving small memory requests

Memory Allocated for Large Memory Requests (Bytes)

Amount of memory, in bytes, the kernel allocated to satisfy large memory requests

Memory Allocated for Small Memory Requests (Bytes

Amount of memory, in bytes, the kernel allocated to satisfy small memory requests


2.16 Load

The Load metric provides information about the number of runnable processes on the system run queue. If this is greater than the number of CPU's on the system, then excess load exists.

Note:

For all target versions, the collection frequency for each metric is every 5 minutes.

The data sources for this metric category include the following:

Host Data Source
Solaris kernel statistics
HP pstat_getdynamic(), pstat_getprocessor(), pstat_getproc(), pstat_getstatic(), getutent(), pstat_getvminfo() system calls
Linux uptime, free, getconf, ps, iostat, sar, w OS commands; /proc/stat
HP Tru64 table() system call, uptime, vmstat, psrinfo, ps, who, swapon OS commands
IBM AIX oracle_kstat(), getutent(), getproc(), sysconf() system calls
Windows performance data counters (unless noted) (unless otherwise noted)

The following table lists the metrics and their descriptions.

Table 2-21 Load Metrics

Metric Description

CPU in IO-Wait (%)

See Section 2.16.1, "CPU in IO-Wait (%)"

CPU in System Mode (%)

For UNIX-based platforms, this metric represents the amount of CPU being used in SYSTEM mode as a percentage of total CPU processing power.

For Windows, this metric represents the percentage of time the process threads spent executing code in privileged mode.

CPU in User Mode (%)

For UNIX-based platforms, this metric represents the amount of CPU being used in USER mode as a percentage of total CPU processing power.

For Windows, this metric represents the percentage of time the processor spends in the user mode. This metric displays the average busy time as a percentage of the sample time.

CPU Interrupt Time (%)

See Section 2.16.2, "CPU Interrupt Time (%)". Note: This metric is available only on Windows.

CPU Queue Length

See Section 2.16.3, "CPU Queue Length". Note: This metric is available only on Windows.

CPU Utilization (%)

See Section 2.16.4, "CPU Utilization (%)"

Free Memory (%)

Amount of free memory as a percentage of total memory. The data source for Windows host is Windows API.

Longest Service Time (ms)

Maximum of the average service time of all disks. Units are represented in milliseconds. Note: This metric is not available on Windows.

Memory Page Scan Rate (per second)

See Section 2.16.5, "Memory Page Scan Rate (per second)"

Memory Utilization (%)

See Section 2.16.6, "Memory Utilization (%)"

Page Transfers Rate

See Section 2.16.7, "Page Transfers Rate". Note: This metric is available only on Windows.

Run Queue Length (1 minute average)

See Section 2.16.8, "Run Queue Length (1 minute average)". Note: This metric is not available on Windows.

Run Queue Length (5 minute average)

See Section 2.16.10, "Run Queue Length (5 minute average)". Note: This metric is not available on Windows.

Run Queue Length (15 minute average)

See Section 2.16.9, "Run Queue Length (15 minute average)". Note: This metric is not available on Windows.

Swap Utilization (%)

See Section 2.16.11, "Swap Utilization (%)"

Total Disk I/O Per Second

Rate of I/O (read and write) operations, calculated from all disks. Note: This metric is not available on Windows.

Total Processes

Total number of processes currently running on the system.

Total Swap, Kilobytes

Total amount of page file space available to be allocated by processes. Paging files are shared by all processes and the lack of space in paging files can prevent processes from allocating memory. Note: This metric is available only on Windows. The data sources for this metric are Performance Data counters and Windows API GlobalMemoryStatusEx.

Total Users

Represents the total number of users currently logged into the system. This metric checks the number of users running on the system. Note: This metric is not available on Windows.

Used Swap, Kilobytes

Size in kilobytes of the page file instance used. Note: This metric is available only on Windows. The data sources for this metric are Performance Data counters and Windows API GlobalMemoryStatusEx.


2.16.1 CPU in IO-Wait (%)

Represents the average number of jobs waiting for I/O in the last interval.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-22 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

40

80

6

CPU I/O Wait is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


User Action

A high percentage of I/O wait can indicate a hardware problem, a slow NFS server, or poor load-balancing among local file systems and disks. Check the system messages log for any hardware errors. Use the iostat -xn command or the nfsstat -c (NFS client-side statistics) command or both to determine which disks or file systems are slow to respond. Check to see if the problem is with one or more swap partitions, as lack of swap or poor disk load balancing can cause these to become overloaded. Depending on the specific problem, fixes may include: NFS client or server tuning, hardware replacement, moving applications to other file systems, adding swap space, or restructuring a file system for better performance.

2.16.2 CPU Interrupt Time (%)

Represents the percentage of time the processor spends receiving and servicing hardware interrupts during sample intervals. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards, and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended during interrupts. Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity. Suspends normal thread execution during interrupts.

This metric is available only on Windows.

Data Source

The data sources for this metric are Performance Data counters.

2.16.3 CPU Queue Length

Processor Queue Length is the number of ready threads in the processor queue. There is a single queue for processor time even on computers with multiple processors. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent on the workload.

This metric is available only on Windows.

Data Source

The data sources for this metric are Performance Data counters.

User Action

A consistently high value indicates a number of CPU bound tasks. This information should be corelated with other metrics such as Page Transfer Rate. Tuning the system, accompanied with additional memory, should help.

2.16.4 CPU Utilization (%)

For UNIX-based platforms, this metric represents the amount of CPU utilization as a percentage of total CPU processing power available.

For Windows, this metric represents the percentage of time the CPU spends to execute a non-Idle thread. CPU Utilization (%) is the primary indicator of processor activity.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-23 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

80

95

6

CPU Utilization is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


2.16.5 Memory Page Scan Rate (per second)

For UNIX-based systems, this metric represents the number of pages per second scanned by the page stealing daemon.

For Windows, this metric represents the rate at which pages are read from or written to disk to resolve hard page faults. The metric is a primary indicator of the kinds of faults that cause system-wide delays.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-24 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

6

Page scan rate is %value% /sec, crossed warning (%warning_threshold% /sec) or critical (%critical_threshold% /sec) threshold.


User Action

If this number is zero or close to zero, then you can be sure the system has sufficient memory. If scan rate is always high, then adding memory will definitely help.

2.16.6 Memory Utilization (%)

Represents the amount of free memory as a percentage of total memory.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-25 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

99

Not Defined

6

Memory Utilization is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Data Source

For the Windows host, the data source is the Windows API.

2.16.7 Page Transfers Rate

Indicates the rate at which pages are read from or written to disk to resolve hard page faults. It is a primary indicator of the kinds of faults that cause systemwide delays. It is counted in numbers of pages. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.

This metric is available only on Windows.

Data Source

The data sources for this metric are Windows Performance counters.

User Action

High transfer rates indicate a memory contention. Adding memory would help.

2.16.8 Run Queue Length (1 minute average)

Represents the average number of processes in memory and subject to be run in the last interval. This metric checks the run queue.

This metric is not available on Windows.

User Action

Check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.

2.16.9 Run Queue Length (15 minute average)

Represents the average number of processes in memory and subject to be run in the last interval. This metric checks the run queue.

This metric is not available on Windows.

User Action

Check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.

2.16.10 Run Queue Length (5 minute average)

Represents the average number of processes in memory and subject to be run in the last interval. This metric checks the run queue.

This metric is not available on Windows.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-26 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

10

20

6

CPU Load is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


User Action

Check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.

2.16.11 Swap Utilization (%)

For UNIX-based platforms, this metric represents the percentage of swapped memory in use for the last interval.

For Windows, this metric represents the percentage of page file instance used.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-27 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

80

95

6

Swap Utilization is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Data Source

The data sources for the Windows host are Windows API and performance data counters.

User Action

For UNIX-based platforms, check the swap usage using the UNIX top command or the Solaris swap -l command. Additional swap can be added to an existing file system by creating a swap file and then adding the file to the system swap pool. (See documentation for your UNIX OS). If swap is mounted on /tmp, space can be freed by removing any junk files in /tmp. If it is not possible to add file system swap or free up enough space, additional swap will have to be added by adding a raw disk partition to the swap pool. See UNIX documentation for procedures.

For Windows, check the page file usage and add an additional page file if current limits are insufficient.

2.17 Log File Monitoring

The Log File Monitoring metric allows the operator to monitor one or more log files for the occurrence of one or more perl patterns in the content. In addition, the operator can specify a perl pattern to be ignored for the log file. Periodic scanning will be performed against new content added since the last scan, lines matching the ignore pattern will be ignored first, then lines matching specified match patterns will result in one record being uploaded to the repository for each pattern. The user can set a threshold against the number of lines matching the given pattern. File rotation will be handled within the given file.

2.17.1 Log File Pattern Matched Content

Returns the actual content if the given file has been specifically registered for content uploading, else it will return the count of lines that matched the pattern specified.

The operator can list the names of files or directories to be never monitored in <EMDROOT>/sysman/config/lfm_efiles file. The operator can list the names of the files or directories whose contents can be uploaded into Oracle Management Repository in <EMDROOT>/sysman/config/lfm_ifiles file.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

Oracle provided perl program that scans files for the occurrence of user specified perl patterns.

2.17.2 Log File Pattern Matched Line Count

Returns the number of lines matching the pattern specified in the given file. Setting warning or critical thresholds against this column for a specific {log file name, match pattern in perl, ignore pattern in perl} triggers the monitoring of specified criteria against the given log file.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-28 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every Sample

>

0

Not Defined

1*

%log_file_message% Crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Log File Name", "Match Pattern in Perl", "Ignore Pattern in Perl", and "Time Stamp" objects.

If warning or critical threshold values are currently set for any unique combination of "Log File Name", "Match Pattern in Perl", "Ignore Pattern in Perl", and "Time Stamp" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Log File Name", "Match Pattern in Perl", "Ignore Pattern in Perl", and "Time Stamp" objects, use the Edit Thresholds page.

Data Source

Oracle supplied perl program monitors the log files for user specified criteria.

2.18 Memory Devices

The Memory Devices metric monitors the status of memory devices configured in the system.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the metrics, descriptions, and data sources.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

Table 2-29 Memory Devices Metrics

Metric Description Data Source (SNMP MIB Object)

Bank Location

Bank location name of the memory device, when applicable

memoryDeviceBankLocationName (1.3.6.1.4.1.674.10892.1.1100.50.1.10)

Location

Location name of the memory device, for example, "DIMM A".

memoryDeviceLocationName (1.3.6.1.4.1.674.10892.1.1100.50.1.8)

Memory

See Section 2.18.1, "Memory Status"

Section 2.18.1, "Memory Status"

Size (MB)

Size, in kilobytes, of the memory device

memoryDeviceSize (1.3.6.1.4.1.674.10892.1.1100.50.1.14)

Type

Type of the memory device

memoryDeviceType (1.3.6.1.4.1.674.10892.1.1100.50.1.7)


2.18.1 Memory Status

Represents the status of the memory device.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-30 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of Memory at bank location %MemoryBankLocation% and location %MemoryLocation% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis" and "Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis" and "Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis" and "Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: memoryDeviceStatus (1.3.6.1.4.1.674.10892.1.1100.50.1.5)

2.19 Message and Semaphore Activity

The Message and Semaphore Activity metric provides information about the message and semaphore activity of the host system being monitored.

The data sources for this metric include the following:

Host Data Source
Solaris sar command
HP sar command
Linux not available
HP Tru64 ipcs command
IBM AIX sar command
Windows not available

The following table lists the metrics and their descriptions.

Table 2-31 Message and Semaphore Activity

Metric Description

msgrcv() System Calls (per second)

Number of msgrcv system calls made per second. The msgrcv system call reads a message from one queue to another user-defined queue.

semop() System Calls (per second)

Number of semop system calls made per second. The semop system call is used to perform semaphore operations on a set of semaphores.


2.20 Network Interfaces

The Network Interfaces metric includes input errors and interface collisions on the network interface. The following network interfaces are supported: le, hme, qfe, ge, and fddi.

Note:

For all target versions, the collection frequency for each metric is every 5 minutes.

Data Source

The data sources for the metrics in this category include the following:

Host Data Source
Solaris kernel memory structures (kstat)
HP netstat, lanscan, and lanadmin commands
Linux netstat command and /proc/net/dev
HP Tru64 netstat command
IBM AIX oracle_kstat() system call
Windows not available

User Action

Use the OS netstat -i command to check the performance of the interface. Also, check the system messages file for messages relating to duplex setting by using the OS grep -i command and searching for the word 'duplex'.

Metrics and Descriptions

The following table lists the metrics and their descriptions.

Table 2-32 Network Interfaces Metrics

Metric Description

Network Interface Input Errors (%)

Number of input errors, per second, encountered on the device for unsuccessful reception due to hardware/network errors. This metric checks the rate of input errors on the network interface specified by the network device names parameter, such as le0 or * (for all network interfaces).

Network Interface Collisions (%)

Number of collisions per second. This metric checks the rate of collisions on the network interface specified by the network device names parameter, such as le0 or * (for all network interfaces).

Network Interface Combined Utilization (%)

See Section 2.20.1, "Network Interface Combined Utilization (%)"

Network Interface Output Errors (%)

Number of output errors per second. This metric checks the rate of output errors on the network interface specified by the network device names parameter, such as le0 or * (for all network interfaces).

Network Interface Read (MB/s)

Amount of megabytes per second read from the specific interface

Network Interface Read Utilization (%)

Amount of network bandwidth being used for reading from the network as a percentage of total read capacity

Network Interface Total Error Rate (%)

See Section 2.20.2, "Network Interface Total Error Rate (%)"

Network Interface Total I/O Rate (MB/sec)

See Section 2.20.3, "Network Interface Total I/O Rate (MB/sec)"

Network Interface Write (MB/s)

Amount of megabytes per second written to the specific interface

Network Interface Write Utilization (%)

Amount of network bandwidth being used for writing to the network as a percentage of total read capacity.


2.20.1 Network Interface Combined Utilization (%)

Represents the percentage of network bandwidth being used by reading and writing from and to the network for full-duplex network connections.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-33 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

6

Network utilization for %keyvalue% is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Network Interface Name" object.

If warning or critical threshold values are currently set for any "Network Interface Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Network Interface Name" object, use the Edit Thresholds page.

2.20.2 Network Interface Total Error Rate (%)

Represents the number of total errors per second, encountered on the network interface. It is the rate of read and write errors encountered on the network interface.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-34 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

6

Network Error Rate for %keyvalue% is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Network Interface Name" object.

If warning or critical threshold values are currently set for any "Network Interface Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Network Interface Name" object, use the Edit Thresholds page.

Data Source

It is computed as the sum of Network Interface Input Errors (%) and Network Interface Output Errors (%).

2.20.3 Network Interface Total I/O Rate (MB/sec)

Represents the total I/O rate on the network interface. It is measured as the sum of Network Interface Read (MB/s) and Network Interface Write (MB/s).

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-35 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

6

Network I/O Rate for %keyvalue% is %value%MB/Sec, crossed warning (%warning_threshold%MB/Sec) or critical (%critical_threshold%MB/Sec) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Network Interface Name" object.

If warning or critical threshold values are currently set for any "Network Interface Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Network Interface Name" object, use the Edit Thresholds page.

Data Source

It is computed as the sum of Network Interface Read (MB/s) and Network Interface Write (MB/s).

2.21 Paging Activity

The Paging Activity metric provides the amount of paging activity on the system.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

Data Source

The data sources for this metric category include the following:

Host Data Source
Solaris kernel statistics (class misc cpu_stat)
HP pstat_getvminfo() system call
Linux sar command
HP Tru64 table(() system call and vmstat command
IBM AIX oracle_kstat() system call
Windows performance data counters

Metrics and Descriptions

The following table lists the metrics and their descriptions:

Table 2-36 Paging Activity Metrics

Metric Description

Address Translation Page Faults (per second)

Minor page faults by way of hat_fault() per second. This metric checks the number of faults for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). Note: This metric is not available on Linux and Windows.

Cache Faults

Rate at which faults occur when a page sought in the file system cache is not found and must be retrieved from elsewhere in memory (a soft fault) or from disk (a hard fault). The file system cache is an area of physical memory that stores recently used pages of data for applications. Cache activity is a reliable indicator of most application I/O operations. This metric shows the number of faults, without regard for the number of pages faulted in each operation. Note: This metric is available only on Windows.

Copy-on-write Faults (per second)

Rate at which page faults are caused by attempts to write that have been satisfied by coping of the page from elsewhere in physical memory. This is an economical way of sharing data since pages are only copied when they are written to; otherwise, the page is shared. This metric shows the number of copies, without regard for the number of pages copied in each operation. Note: This metric is available only on Windows.

Demand Zero Faults (per second)

Rate at which a zeroed page is required to satisfy the fault. Zeroed pages, pages emptied of previously stored data and filled with zeros, are a security feature of Windows that prevent processes from seeing data stored by earlier processes that used the memory space. Windows maintains a list of zeroed pages to accelerate this process. This metric shows the number of faults, without regard to the number of pages retrieved to satisfy the fault. Note: This metric is available only on Windows.

igets with Page Flushes (%)

Represents the percentage of UFS inodes taken off the freelist by iget which had reusable pages associated with them. These pages are flushed and cannot be reclaimed by processes. Note: This metric is available on Solaris, HP, and IBM AIX.

Page Faults (per second)

Average number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation, hence this is also equal to the number of page fault operations. This metric includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant delays. Note: This metric is available only on Windows.

Page Faults from Software Lock Requests

Represents the number of protection faults per second. These faults occur when a program attempts to access memory it should not access, receives a segmentation violation signal, and dumps a core file. This metric checks the number of faults for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). Note: This metric is not available on Linux or Windows.

Page-in Requests (per second)

For UNIX-based systems, represents the number of page read ins per second (read from disk to resolve fault memory references) by the virtual memory manager. Along with Page Outs, this statistic represents the amount of real I/O initiated by the virtual memory manager. This metric checks the number of page read ins for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system).

For Windows, this metric is the rate at which the disk was read to resolve hard page faults. It shows the number of reads operations, without regard to the number of pages retrieved in each operation. Hard page faults occur when a process references a page in virtual memory that is not in working set or elsewhere in physical memory, and must be retrieved from disk. This metric is a primary indicator of the kinds of faults that cause systemwide delays. It includes read operations to satisfy faults in the file system cache (usually requested by applications) and in non-cached mapped memory files.

Note: This metric is not available on Linux.

Page-out Requests (per second)

For UNIX-based systems, represents the number of page write outs to disk per second. This metric checks the number of page write outs for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system).

For Windows, this metric is the rate at which pages are written to disk to free up space in physical memory. Pages are written to disk only if they are changed while in physical memory, so they are likely to hold data, not code. This metric shows write operations, without regard to the number of pages written in each operation.

Note: This metric is not available on Linux.

Pages Paged-in (per second)

For UNIX-based systems, represents the number of pages paged in (read from dirk to resolve fault memory references) per second. This metric checks the number of pages paged in for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system).

For Windows, this metric is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation.

Pages Paged-out (per second)

For UNIX-based systems, represents the number of pages written out (per second) by the virtual memory manager. Along with Page Outs, this statistic represents the amount of real I/O initiated by the virtual memory manager. This metric checks the number of pages paged out for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system).

For Windows, this metric is the rate at which pages are written to disk to free up space in physical memory. Pages are written back to disk only if they are changed in physical memory, so they are likely to hold data, not code. A high rate of pages output might indicate a memory shortage. Windows writes more pages back to disk to free up space when physical memory is in short supply.

Pages Put on Freelist by Page Stealing Daemon (per second)

Number of pages that are determined unused, by the pageout daemon (also called the page stealing daemon), and put on the list of free pages. Note: This metric is not available on Linux and Windows.

Pages Scanned by Page Stealing Daemon (per second)

Represents the scan rate is the number of pages per second scanned by the page stealing daemon.

If this number is zero or closer to zero, then you can be sure the system has sufficient memory. If the number is always high, then adding memory will definitely help. Note: This metric is not available on Linux and Windows.

Transition Faults (per second)

Rate at which page faults are resolved by recovering pages that were being used by another process sharing the page, or were on the modified page list or the standby list, or were being written to disk at the time of the page fault. The pages were recovered without additional disk activity. Transition faults are counted in numbers of faults; because only one page is faulted in each operation, it is also equal to the number of pages faulted. Note: This metric is available only on Windows.


2.22 PCI Devices

The Peripheral Component Interconnect (PCI) Devices metric monitors the status of PCI devices.

This metric is available only on Dell Poweredge Linux Systems.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The following table lists the metrics, their descriptions, and user actions.

Table 2-37 PCI Devices Metrics

Metric Description Data Source (SNMP MIB Object)

Description

Descriptive name of the Dell Peripheral Component Interconnect (PCI) Device

pCIDeviceDescriptionName (1.3.6.1.4.1.674.10892.1.1100.80.1.9)

Manufacturer

Name of the Dell Peripheral Component Interconnect (PCI) Device manufacturer

pCIDeviceManufacturerName (1.3.6.1.4.1.674.10892.1.1100.80.1.8)

PCI Device Status

See Section 2.22.1, "PCI Device Status"

See Section 2.22.1, "PCI Device Status"


2.22.1 PCI Device Status

Represents the status of the Dell Peripheral Component Interconnect (PCI) Device.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-38 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of PCIDevice %PCIDeviceIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index", "PCI Device Index", and "System Slot Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis Index", "PCI Device Index", and "System Slot Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis Index", "PCI Device Index", and "System Slot Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: pCIDeviceStatus (1.3.6.1.4.1.674.10892.1.1100.80.1.5)

2.23 Power Supplies

The Power Supplies metric monitors the status of various power supplies present in the host system.

This metric is available only on Dell Poweredge Linux Systems.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The following table lists the metrics, their descriptions, and user actions.

Table 2-39 Power Supplies Metrics

Metric Description Data Source (SNMP MIB Object)

Location

Location name of the power supply

powerSupplyLocationName (1.3.6.1.4.1.674.10892.1.600.12.1.8

Output (Tenths of Watts)

maximum sustained output wattage of the power supply, in tenths of watts

powerSupplyOutputWatts (1.3.6.1.4.1.674.10892.1.600.12.1.6)

Power Supply Status

See Section 2.23.1, "Power Supply Status"

See Section 2.23.1, "Power Supply Status"


2.23.1 Power Supply Status

Represents the status of the power supply.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-40 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of Power Supply %PSIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Power Supply Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Power Supply Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Power Supply Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: powerSupplyStatus (1.3.6.1.4.1.674.10892.1.600.12.1.5)

2.24 Process, Inode, File Tables Statistics

The Process, Inode, File Tables Stats metric provides information about the process, inode, and file tables status.

Data Source

The data sources for this metric category include the following:

Host Data Source
Solaris sar command
HP sar command
Linux sar command, for example, sar -v
HP Tru64 table() system call
IBM AIX sar command
Windows not available

The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval.

Metrics and Descriptions

The following table lists the metrics and their descriptions.

Table 2-41 Process, Inode, File Tables Statistics Metrics

Metric Description

File Table Overflow Occurrences

Number of times the system file table overflowed, that is, the number of times that the OS could not find any available entries in the table in the sampling period chosen to collect the data. Note: This metric is not available on Linux or Windows.

Inode Table Overflow Occurrences

Number of times the inode table overflowed, that is, the number of times the OS could not find any available inode table entries. Note: This metric is not available on Linux or Windows.

Maximum Size of Inode Table

Maximum size of the inode table. Note: This metric is not available on Linux or Windows.

Maximum Size of Process Table

Maximum size of the process table. Note: This metric is not available on Linux or Windows.

Maximum Size of System File Table

Maximum size of the system file table. Note: This metric is not available on Linux or Windows.

Number of Allocated Disk Quota Entries

Number of allocated disk quota entries. Note: This metric is available only on Linux.

Number of Queued RT Signals

Number of queued RT signals. Note: This metric is available only on Linux.

Number of Super Block Handlers Allocated

Number of allocated super block handlers. Note: This metric is available only on Linux.

Number of Used File Handles

Current size of the system file table.

Percentage of Allocated Disk Quota Entries

Percentage Of Allocated Disk Quota Entries against the maximum number of cached disk quota entries that can be allocated. Note: This metric is available only on Linux.

Percentage of Allocated Super Block Handlers

Percentage Of Allocated Super Block Handlers against the maximum number of super block handlers that Linux can allocate. Note: This metric is available only on Linux.

Percentage of Queued RT Signals

Percentage of queued RT signals. Note: This metric is available only on Linux.

Percentage of Used File Handles

Percentage of used file handles against the maximum number of file handles that the Linux kernel can allocate. Note: This metric is available only on Linux.

Process Table Overflow Occurrences

Number of times the process table overflowed, that is, the number of times the OS could not find any process table entries in a five-second interval. Note: This metric is not available on Linux or Windows.

Size of Inode Table

Current size of the inode table.

Size of Process Table

Current size of the process table. Note: This metric is not available on Linux or Windows


2.25 Processors

The Processors metric monitors the state of each CPU in the host.

This metric is available only on Dell Poweredge Linux Systems.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The following table lists the metrics, descriptions, and data sources.

Table 2-42 Processors Metrics

Metric Description Data Source (SNMP MIB Object)

Family

Family of the Dell process device

processorDeviceFamily (1.3.6.1.4.1.674.10892.1.1100.30.1.10)

Manufacturer

Name of the manufacturer of the Dell processor

processorDeviceManufacturerName (1.3.6.1.4.1.674.10892.1.1100.30.1.8)

Processor Status

See Section 2.25.1, "Processor Status"

See Section 2.25.1, "Processor Status"

Speed (MHz)

current speed of the Dell processor device in Mega Hertz (MHz). A value of zero indicates the speed is unknown.

processorDeviceCurrentSpeed (1.3.6.1.4.1.674.10892.1.1100.30.1.12)

Version

Version of the Dell processor

processorDeviceVersionName (1.3.6.1.4.1.674.10892.1.1100.30.1.16)


2.25.1 Processor Status

Represents the status of the Dell processor device.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-43 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of Processor %ProcessorIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Processor Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Processor Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Processor Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: processorDeviceStatus (1.3.6.1.4.1.674.10892.1.1100.30.1.5)

2.26 Program Resource Utilization

The Program Resource Utilization metric provides flexible resource monitoring functionality. The operator must specify the criteria for the programs to be monitored by specifying key value specific thresholds. Values for the key value columns {program name, owner} define the unique criteria to be monitored for resource utilization in the system.

By default, no programs will be tracked by this metric. Key Values entered as part of a key value specific threshold setting define the criteria for monitoring and tracking.

Note:

For all target versions, the collection frequency for each metric is every 5 minutes.

The data sources for this metric category include the following:

Host Data Source
Solaris ps command
HP ps command
Linux ps command
HP Tru64 ps command
IBM AIX ps command
Windows performance data counters

The following table lists the metrics and their descriptions.

Table 2-44 Program Resource Utilization Metrics

Metric Description

List of PIDs

This metric is only available on Solaris.

Program's Max CPU Time Accumulated (Minutes)

See Section 2.26.1, "Program's Max CPU Time Accumulated (Minutes)"

Program's Max CPU Time Accumulated PID

Identifier of the process that has accumulated the most CPU time matching the {program name, owner} key value criteria

Program's Max CPU Utilization (%)

See Section 2.26.2, "Program's Max CPU Utilization (%)"

Program's Max CPU Utilization PID

Identifier of the process with the maximum percentage of CPU utilized matching the {program name, owner} key value criteria since last scan

Program's Max Process Count

See Section 2.26.3, "Program's Max Process Count"

Program's Max Resident Memory (MB)

See Section 2.26.4, "Program's Max Resident Memory (MB)"

Program's Max Resident Memory PID

Identifier of the process with the maximum resident memory occupied by a single process matching the {program name, owner} key value criteria

Program's Min Process Count

See Section 2.26.5, "Program's Min Process Count"

Program's Total CPU Time Accumulated (Minutes)

See Section 2.26.6, "Program's Total CPU Time Accumulated (Minutes)"

Program's Total CPU Utilization (%)

See Section 2.26.7, "Program's Total CPU Utilization (%)"


2.26.1 Program's Max CPU Time Accumulated (Minutes)

Represents the maximum CPU time accumulated by the most active process matching the {program name, owner} key value criteria.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-45 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

3

%prog_max_cpu_time_pid% process running program %prog_name% has accumulated %prog_max_cpu_time% minutes of cpu time. This duration crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.26.2 Program's Max CPU Utilization (%)

Represents the maximum percentage of CPU utilized by a single process matching the {program name, owner} key value criteria since last scan.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-46 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

3

Process %prog_max_cpu_util_pid% running program %prog_name% is utilizing %prog_max_cpu_util%%% cpu. This percentage crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.26.3 Program's Max Process Count

Fetches the current number of processes matching the {program name, owner} key value criteria. It can be used for setting warning or critical thresholds to monitor for maximum number of processes that a given {program name, owner} key value criteria crosses.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-47 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

3

%prog_max_process_count% processes are running program %prog_name% owned by [%owner%], crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.26.4 Program's Max Resident Memory (MB)

Represents the maximum resident memory occupied by a single process matching the {program name, owner} key value criteria. It can be used for setting warning or critical thresholds to monitor for maximum value a given {program name, owner} key value criteria crosses.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-48 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

3

%prog_max_rss_pid% process running program %prog_name% is utilizing %prog_max_rss% (MB) of resident memory. This percentage crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.26.5 Program's Min Process Count

Fetches the current number of processes matching the {program name, owner} key value criteria. It can be used for setting warning or critical thresholds to monitor for minimum number of processes that a given {program name, owner} key value criteria should never go under.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-49 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

<

Not Defined

Not Defined

3

%prog_max_process_count% processes are running program %prog_name% owned by [%owner%], fallen below warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.26.6 Program's Total CPU Time Accumulated (Minutes)

Represents the total CPU time accumulated by all active process matching the {program name, owner} key value criteria.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-50 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

3

%prog_max_count% processes running program %prog_name% owned by [%owner%] have accumulated %prog_total_cpu_time% minutes of cpu time. This duration crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.26.7 Program's Total CPU Utilization (%)

Represents the percentage of CPU time utilized by all active process matching the {program name, owner} key value criteria since last collection.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-51 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

After Every Sample

>

Not Defined

Not Defined

3

%prog_max_count% processes running program %prog_name% owned by [%owner%] are utilizing %prog_total_cpu_util%%% cpu. This percentage crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.

If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.

2.27 Remote Access Card

The Remote Access Card metric monitors the status of the Remote Access Card.

This metric is available only on Dell Poweredge Linux Systems.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The following table lists the metrics, their descriptions, and data sources.

Table 2-52 Remote Access Card Metrics

Metric Description Data Source (SNMP MIB Object)

DHCP Settings

Determines whether the dynamic host configuration protocol (DHCP) was used to obtain the network interface card (NIC) information.

remoteAccessNICCurrentInfoFromDHCP (1.3.6.1.4.1.674.10892.1.1700.10.1.33)

Gateway Address

Represents the IP address for the gateway currently being used by the onboard network interface card (NIC) provided by the remote access (RAC) hardware.

remoteAccessNICCurrentGatewayAddress (1.3.6.1.4.1.674.10892.1.1700.10.1.32)

IP Address

Provides the internet protocol (IP) address currently being used by the onboard network interface card (NIC) provided by the remote access (RAC) hardware

remoteAccessNICCurrentIPAddress (1.3.6.1.4.1.674.10892.1.1700.10.1.30)

LAN Settings

Represents the local area network (LAN) settings of the remote access hardware.

remoteAccessLANSettings (1.3.6.1.4.1.674.10892.1.1700.10.1.15

Network Mask Address

Represents the subnet mask currently being used by the onboard network interface card (NIC) provided by the remote access (RAC) hardware.

remoteAccessNICCurrentNetmaskAddress (1.3.6.1.4.1.674.10892.1.1700.10.1.31)

Product Name

Represents the name of the product providing the remote access (RAC) functionality

remoteAccessProductInfoName (1.3.6.1.4.1.674.10892.1.1700.10.1.7)

Remote Access Card State

Represents the state of the remote access (RAC) hardware.

remoteAccessStateSettings (1.3.6.1.4.1.674.10892.1.1700.10.1.5)

Remote Access Card Status

See Section 2.27.1, "Remote Access Card Status"

See Section 2.27.1, "Remote Access Card Status"

Version

Represents the version of the product providing the remote access (RAC) functionality.

remoteAccessVersionInfoName (1.3.6.1.4.1.674.10892.1.1700.10.1.9)


2.27.1 Remote Access Card Status

Represents the status of the remote access (RAC) hardware.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-53 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of Remote Access Card is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Data Source

SNMP MIB object: remoteAccessStatus (1.3.6.1.4.1.674.10892.1.1700.10.1.6)

2.28 Response

This metric provides the status of the host, that is, whether it is up or down.

2.28.1 Status

The metric indicates whether the host is reachable or not. A host could be unreachable due to various reasons. The network is down or the Management Agent on the host is down (which could be because the host itself is shutdown).

2.29 Storage Summary Metrics

The Storage Summary metrics collectively represent the summary of storage data on a host target. These metrics are derived from the various metrics collected and uploaded into the Oracle Management Repository by the Management Agent. They are computed every time the Management Agent populates the Management Repository with storage data. This collection is also triggered automatically whenever the user manually refreshes the host storage data from the Storage Details page.

These metrics are available on the Linux and Solaris hosts.

Note:

For target versions 3.0 and higher, the collection frequency for each metric is every 24 hours or when the user manually refreshes storage data from the Storage Details page.

For more details on how these metrics are computed see the "About Storage Computation Formulas" topic in the Enterprise Manager online help. The online help also provides information about ASM, databases, disks, file systems, volumes, and storage details.

The following table lists the metrics and their descriptions.

Table 2-54 Storage Summary Metrics

Metric Description

ASM Storage Allocated (GB)

Total storage allocated to Oracle databases from Automatic Storage Management (ASM) instances on the host

ASM Storage Metric Collection Errors

Number of metric collection errors attributed to the storage related metrics of the Automatic Storage Management (ASM) targets on the host

ASM Storage Overhead (GB)

Storage overhead of Automatic Storage Management (ASM) targets on the host

ASM Storage Unallocated (GB)

Storage available in Automatic Storage Management (ASM) targets on the host for allocating to databases

Databases Storage Free (GB)

Total free storage available in the databases on the host

Databases Storage Metric Collection Errors

Metric collection errors of storage related metrics of databases on the host

Databases Storage Used (GB)

Total free storage available in the databases on the host

Disk Storage Allocated (GB)

Storage allocated from the total disk storage available on the host

Disk Storage Unallocated (GB)

Storage that is available for allocation in disks on the host.

Host Storage Metric Collection Errors

Total number of storage related metric collection errors of the host target

Hosts Summarized

The possible values for this metric are:

  • 1 (one) if this host storage was computed successfully (sometimes with partial errors)

  • 0 (zero) if the storage computation did not proceed at all due to some reasons (for example, failure to collect critical storage metric data).

Local File Systems Storage Free (GB)

Total free storage in all distinct local file systems on the host

Local File Systems Storage Used (GB)

Total used space in all distinct local file systems on the host

Number of ASM Instances Summarized

Total number of Automatic Storage Management (ASM) instances, the storage data of which was used in computing storage summary of this host

Number of Databases Summarized

Total number of databases, the storage data of which was used in computing storage summary of this host

Other Mapping Errors

Storage metric mapping issues on the host excluding the unmonitored server mapping errors

Total Number of ASM Instances

Total number of Automatic Storage Management (ASM) instances on the host

Total Number of Databases

Total number of databases on the host

Total Storage Allocated (GB)

Total storage allocated from the host-visible storage available on the host

Total Storage Free (GB)

Free storage available from the total allocated storage on the host

Total Storage Overhead (GB)

Overhead associated with storage on the host

Total Storage Unallocated (GB)

Total unallocated storage on the host

Total Storage Used (GB)

Total storage used in the file systems and databases on the host

Unmonitored NFS Server Mapping Errors

Total number of storage mapping issues that result from unmonitored Network File Systems (NFS) servers

Volumes Storage Allocated (GB)

Total storage allocated from the volumes available on the host

Volumes Storage Overhead (GB)

Storage overhead in the volumes on the host

Volumes Storage Unallocated (GB)

Storage available for allocation in the volumes on the host

Writeable NFS Storage Free (GB)

Total free space available in all distinct writeable NFS mounts on the host

Writeable NFS Storage Used (GB)

Storage used in all writeable NFS mounts on the host


2.30 Swap Area Status

The Swap Area Status metric provides the status of the swap memory on the system.

The data sources for this metric category include the following:

Host Data Source
Solaris swap
HP swapinfo
Linux /proc/swaps
HP Tru64 swapon
IBM AIX lsps
Windows not available

2.30.1 Swap Free

Represents the number of 1K blocks in swap area that is not allocated.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 24 Hours

User Action

Check the swap usage using the UNIX top command or the Solaris swap -l command. Additional swap can be added to an existing file system by creating a swap file and then adding the file to the system swap pool. (See documentation for your UNIX OS). If swap is mounted on /tmp, space can be freed by removing any junk files in /tmp. If it is not possible to add file system swap or free up enough space, additional swap will have to be added by adding a raw disk partition to the swap pool. See UNIX documentation for procedures.

2.30.2 Swap Size

Represents the size of the swap file.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 24 Hours

2.31 Switch/Swap Activity

The Switch/Swap Activity metric displays the metric reports on the system switching and swapping activity.

Data Source

The data sources for this metric category, unless otherwise stated, include the following:

Host Data Source
Solaris sar command
HP sar command
Linux sar command
HP Tru64 not available
IBM AIX sar command
Windows not available

The OS sar command is used to sample cumulative activity counters maintained by the OS. Also, the data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of processes swapped in over this five-second period divided by five.

Metrics and Descriptions

The following table lists the metrics and their descriptions.

Table 2-55 Switch/Swap Activity Metrics

Metric Description

Process Context Switches (per second)

Number of process context switches per second. Note: This metric is available on Solaris, HP, and IBM AIX.

Swapins Transfers (per second)

Number of 512-byte units transferred for swapins per second. Note: This metric is not available on HP Tru64.

Swapout Transfers (per second)

Number of 512-byte units transferred for swapouts per second. Note: This metric is not available on HP Tru64.

System Swapins (per second)

Number of process swapins per second. Note: This metric is not available on HP Tru64.

System Swapouts (per second)

Number of process swapouts per second. Note: This metric is not available on HP Tru64


2.32 System BIOS

The System BIOS (Basic Input/Output System) metric monitors the BIOS status for Dell Poweredge Linux systems.

This metric is available only on Dell Poweredge Linux Systems.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The following table lists the metrics, their descriptions, and data sources.

Table 2-56 System BIOS Metrics

Metric Description Data Source (SNMP MIB Object)

Manufacturer

Manufacturer's name of the System BIOS (Basic Input/Output System

systemBIOSManufacturerName (1.3.6.1.4.1.674.10892.1.300.50.1.11

Size

Image size of the System BIOS (Basic Input/Output System) in kilobytes. A value of zero indicates that the size is unknown.

systemBIOSSize (1.3.6.1.4.1.674.10892.1.300.50.1.6)

System BIOS Status

See Section 2.32.1, "System BIOS Status"

See Section 2.32.1, "System BIOS Status"

Version

Version name of the System BIOS (Basic Input/Output System)

systemBIOSVersionName (1.3.6.1.4.1.674.10892.1.300.50.1.8)


2.32.1 System BIOS Status

Represents the status of the System BIOS (Basic Input/Output System) in this chassis.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-57 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Status of BIOS %BiosIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "System BIOS Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "System BIOS Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "System BIOS Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: systemBIOSStatus (1.3.6.1.4.1.674.10892.1.300.50.1.5)

2.33 System Calls

The System Calls metric provides statistics about the system calls made over a five-second interval.

Data Source

The data sources for this metric category, unless otherwise stated, include the following:

Host Data Source
Solaris sar command
HP sar command
Linux not available
HP Tru64 table(() system call
IBM AIX sar command
Windows not available

The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of system calls made over this period divided by the period.

Metrics and Descriptions

The following table lists the metrics and their descriptions.

Table 2-58 System Calls Metrics

Metric Description

Characters Transferred by Read System Calls (per second)

Number of characters transferred by read system calls (block devices only) per second

Characters Transferred by Write System Calls (per second)

Number of characters transferred by write system calls (block devices only) per second

exec() System Calls (per second)

Number of exec() system calls made per second

fork() System Calls (per second)

Number of fork() system calls made per second

read() System Calls (per second)

Number of read() system calls made per second

System Calls (per second)

Number of system calls made per second. This includes system calls of all types.

write() System Calls (per second)

Number of write() system calls made per second


2.34 Temperature

The Temperature metric monitors the hotness or coldness of the temperature probe.

This metric is available only on Dell Poweredge Linux Systems.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The following table lists the metrics, their descriptions, and user actions.

Table 2-59 Temperature Metrics

Metric Description Data Source (SNMP MIB Object)

Current Temperature

Current reading of the temperature probe. The value is representing temperature in tenths of degrees Centigrade

temperatureProbeReading (1.3.6.1.4.1.674.10892.1.700.20.1.6)

Location

Description of the location name of the temperature probe. Examples of values are: "CPU Temp" and "System Temp".

temperatureProbeLocationName (1.3.6.1.4.1.674.10892.1.700.20.1.8)

Temperature Probe Status

See Section 2.34.1, "Temperature Probe Status"

See Section 2.34.1, "Temperature Probe Status"


2.34.1 Temperature Probe Status

Represents the status of the temperature probe.

This metric is available only on Dell Poweredge Linux Systems.

The following table lists the possible values for this metric and their meaning.

Metric Value Meaning (per SNMP MIB)
1 Other (not one of the following)
2 Unknown
3 Normal
4 Warning
5 Critical
6 Non-Recoverable

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-60 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

Not Uploaded

>=

4

5

1

Temperature at probe %ProbeIndex% in chassis %ChassisIndex% is %TemperatureReading% (C). Status is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Temperature Probe Index" objects.

If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Temperature Probe Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Temperature Probe Index" objects, use the Edit Thresholds page.

Data Source

SNMP MIB object: temperatureProbeStatus (1.3.6.1.4.1.674.10892.1.700.20.1.5)

2.35 Top Processes

The Top Processes metric is a listing of (up to) 20 processes that include 10 processes consuming the largest percentage of memory and 10 processes consuming the most percentage of CPU time. The processes are listed in the order of memory consumption.

The data sources for this metric category include the following:

Host Data Source
Solaris ps command
HP ps command
Linux ps command
HP Tru64 ps command
IBM AIX ps command
Windows performance data counters

The following table lists the metrics and descriptions.

Table 2-61 Top Processes Metrics

Metric Description

Command and Arguments

Command and all its arguments

CPU Time for Top Processes

CPU utilization time in seconds

CPU Utilization for Top Processes (%)

Percentage of CPU time consumed by the process. For UNIX-based platforms, check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.

Memory Utilization for Top Processes (%)

Percentage of memory consumed by the process

Physical Memory Utilization (KB)

Number of kilobytes of physical memory being used. For Solaris and IBM AIX hosts, the data source is kernel memory structure (struct vminfo).

Process User ID

User name that owns the process, that is, the user ID of the process being reported on. For the Windows host, the data source is the Windows API.

Virtual Memory Utilization (KB)

Total size of the process in virtual memory in kilobytes (KB). For the Windows host, the data source is the Windows API.


2.36 TTY Activity

This metric reports tty device activity.

The data sources for this metric include the following:

Host Data Source
Solaris sar command
HP sar command
Linux not available
HP Tru64 table() system call
IBM AIX sar command
Windows not available

The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval.

The following tables lists the metrics and their descriptions.

Table 2-62 TTY Activity Metrics

Metric Description

Incoming Character Interrupts (per second)

Number of received incoming character interrupts per second

Input Characters Processed by canon()

Input characters processed by canon() per second

Modem Interrupt Rate (per second)

Modem interrupt rate

Outgoing Character Interrupts (per second)

Number of transmit outgoing character interrupts per second

TTY Output Characters (per second)

Number of output characters per second

TTY Raw Input (chars/s)

Raw input characters per second


2.37 User Defined Metrics

The UDM metric allows you to execute your own scripts. The data returned by these scripts can be compared against thresholds and generate severity alerts similar to alerts in predefined metrics. UDM is similar to the Oracle9i Management Agent's UDE functionality.

The data source for these metrics is the User Defined Script.

The following table lists the metrics and their descriptions.

Table 2-63 User Defined Metrics

Metric Description

User Defined Numeric Metric

Contains a value if the value type is NUMBER. Otherwise, the value is "", if the value is STRING.

User Defined String Metric

Contains a value if the value type is STRING. Otherwise, the value is "", if the value is NUMBER.


2.38 Users

The Users metric provides information about the users currently on the system being monitored.

2.38.1 Number of Logons

Represents the number of times a user with a certain user name is logged on to the host target.

Data Source

For Solaris, HP, Linux, HP Tru64, and IBM AIX, the number of times a user is logged on is obtained from the OS w command.

For Windows, the source of information is Windows API.

2.39 Windows Events Log

The purpose of this metric is to collect those entries from all available Windows NT event log files whose type is either Error or Warning. A critical or a warning alert is raised only for System and Security Event log file entries.

Note: Since log files continue to grow, this metric outputs log events which had been written to the log file after the last collection time, that is, only those records are written out whose timeGenerated (time when the event was generated) is after the last collection time until the last record of the log file. If this metric is collected for the first time, only the events generated on the current date are outputted.

This metric is available only on Windows.

Note:

For all target versions, the collection frequency for each metric is every 15 minutes.

The data source for these metrics is WMI Operating System Classes.

The following table lists the metrics and their descriptions.

Table 2-64 Windows Events Log Metrics

Metric Description

Category

Subcategory for this event. This subcategory is source-specific.

Date-Time

Date and time when the Source generated the event.

Description

Event message as it appears in the Windows event log.

Event ID

Identifier of the event

Log Name

Name of the Windows event log file

Record Number

Identifies the event within the Windows event log file

Source

Name of the source (application, service, driver, subsystem) that generated the entry

User

Name of the logged-on user when the event occurred. If the user name cannot be determined, the user name is NULL.

Windows Event Severity

See Section 2.39.1, "Windows Event Severity"


2.39.1 Windows Event Severity

The seriousness of the event. Possible values are: Warning and Error.

This metric is available only on Windows.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-65 Metric Summary Table

Target Version Key Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

logfile: "system"

Every 15 Minutes

After Every Sample

=

warning

error

1*

X1User[%user%]:Category[%categorystring%]:Description[%message%]


* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each unique combination of "Log Name", "Source", and "Event ID" objects.

If warning or critical threshold values are currently set for any unique combination of "Log Name", "Source", and "Event ID" objects, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each unique combination of "Log Name", "Source", and "Event ID" objects, use the Edit Thresholds page.

Data Source

WMI Operating System Classes

2.40 Zombie Processes

The Zombie Processes metric monitors the orphaned processes in the different variations of UNIX systems.

2.40.1 Processes in Zombie State (%)

Represents the percentage of all processes running on the system that are currently in zombie state.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 2-66 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 15 Minutes

After Every 60 Samples

>

35

50

1

%value%%% of all processes are in zombie state, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


Data Source

The data sources for this metric include the following:

Host Data Source
Solaris ps command
HP ps command
Linux ps command
HP Tru64 not available
IBM AIX not available
Windows not available