12 Diagnosing Problems

This chapter describes how to use the Diagnostic Framework to collect and manage information about a problem so that you can resolve it or send it to Oracle Support.

This chapter contains the following topics:

12.1 Understanding the Diagnostic Framework

Oracle Fusion Middleware includes a Diagnostic Framework which aids in detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, customer data corruption, deadlocked threads, and inconsistent state.

When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as log files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR), where it can later be retrieved by incident number and analyzed.

The goals of the Diagnostic Framework are:

  • First-failure diagnosis

  • Limiting damage and interruptions after a problem is detected

  • Reducing problem diagnostic time

  • Reducing problem resolution time

  • Simplifying customer interaction with Oracle Support

The Diagnostic Framework includes the following technologies:

  • Automatic capture of diagnostic data upon first failure: For critical errors, the ability to capture error information at first failure greatly increases the chance of a quick problem resolution and reduced downtime. The Diagnostic Framework automatically collects diagnostics, such as thread dumps, DMS metric dumps, and WebLogic Diagnostics Framework (WLDF) server image dumps. Such diagnostic data is similar to the data collected by airplane "black box" flight recorders. When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data. The data is stored in a file-based repository and is accessible with command-line utilities.

  • Standardized log formats: Standardizing log formats (using the ODL log file format) across all Oracle Fusion Middleware components allows administrators and Oracle Support personnel to use a single set of tools for problem analysis. Problems are more easily diagnosed, and downtime is reduced.

  • Diagnostic rules: Each component defines diagnostic rules that are used to evaluate whether a given log message should result in an incident being created and which dumps should be executed. The diagnostic rules also indicate whether an individual dump should be created synchronously or asynchronously.

  • Incident detection log filter: The incident detection log filter implements the java.util.logging filter. It inspects each log message to see if an incident should be created, basing its decision on the diagnostic rules for components and applications.

  • Incident packaging service (IPS) and incident packages: The IPS enables you to automatically and easily gather the diagnostic data—log files, dumps, reports, and more—pertaining to a critical error that has a corresponding incident, and package the data into a zip file for transmission to Oracle Support. All diagnostic data relating to a critical error that has been detected by the Diagnostics Framework is captured and stored as an incident in ADR. The incident packaging service identifies the required files automatically and adds them to the zip file.

    Before creating the zip file, the IPS first collects diagnostic data into an intermediate logical structure called an incident package. Packages are stored in the Automatic Diagnostic Repository. If you choose to, you can access this intermediate logical structure, view and modify its contents, add or remove additional diagnostic data at any time, and when you are ready, create the zip file from the package and upload it to Oracle Support.

  • Integration with WebLogic Diagnostics Framework (WLDF): The Oracle Fusion Middleware Diagnostics Framework integrates with some features of WebLogic Diagnostics Framework (WLDF), including the capturing of WebLogic Server images on detection of critical errors. WLDF is a monitoring and diagnostic framework that defines and implements a set of services that run within WebLogic Server processes and participate in the standard server life cycle. Using WLDF, you can create, collect, analyze, archive, and access diagnostic data generated by a running server and the applications deployed within its containers. This data provides insight into the run-time performance of servers and applications and enables you to isolate and diagnose faults when they occur.

    Oracle Fusion Middleware Diagnostics Framework integrates with the following components of WLDF:

    • WLDF Watch and Notification, which watches specific logs and metrics for specified conditions and sends a notification when a condition is met. There are several types of notifications, including JMX notification and a notification to create a Diagnostic Image. Oracle Fusion Middleware Diagnostics Framework integrates with the WLDF Watch and Notification component to create incidents.

    • Diagnostic Image Capture, which gathers the most common sources of the key server state used in diagnosing problems. It packages that state into a single artifact, the Diagnostic Image. With Oracle Fusion Middleware Diagnostics Framework, it writes the artifact to ADR.

    For more information about WLDF, see Oracle Fusion Middleware Configuring and Using the Diagnostics Framework for Oracle WebLogic Server.

12.1.1 About Incidents and Problems

To facilitate diagnosis and resolution of critical errors, the Diagnostic Framework introduces two concepts for Oracle Fusion Middleware: problems and incidents.

A problem is a critical error. Critical errors manifest as internal errors or other severe errors. Problems are tracked in the ADR. Each problem has a problem key, which is a text string that describes the problem. It includes an error code (in the format XXX-nnnnn) and in some cases, other error-specific values.

An incident is a single occurrence of a problem. When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the ADR. Each incident is identified by a numeric incident ID, which is unique within the ADR home. When an incident occurs, the Diagnostic Framework:

  • Gathers first-failure diagnostic data about the incident in the form of dump files (incident dumps).

  • Stores the incident dumps in an ADR subdirectory created for that incident.

  • Registers the incidents dumps with the incident in ADR.

12.1.1.1 Incident Flood Control

It is conceivable that a problem could generate dozens or perhaps hundreds of incidents in a short period of time. This would generate too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem. For these reasons, the Diagnostic Framework applies flood control to incident generation after certain thresholds are reached. A flood-controlled incident is an incident that is recorded in the ADR, but does not generate incident dumps. Flood-controlled incidents provide a way of informing you that a critical error is ongoing, without overloading the system with diagnostic data. You can choose to view or hide flood-controlled incidents when viewing incidents with ADRCI.

If more than 3 incidents with the same problem key occur within 15 minutes, subsequent incidents with the same problem key are flood controlled. That is, those subsequent incidents are created and written to the ADR, but no diagnostic data is captured.

12.1.2 Diagnostic Framework Components

The following topics describe that key components of the Diagnostic Framework:

12.1.2.1 Automatic Diagnostic Repository

The Automatic Diagnostic Repository (ADR) is a file-based hierarchical repository for Oracle Fusion Middleware diagnostic data, such as traces and dumps. The Oracle Fusion Middleware components store all incident data in the ADR. Each Oracle WebLogic Server stores diagnostic data in subdirectories of its own home directory within the ADR. For example, each Managed Server and Administration Server has an ADR home directory.

The ADR root directory is known as ADR base. By default, the ADR base is located in the following directory:

DOMAIN_HOME/servers/server_name/adr

Within ADR base, there can be multiple ADR homes, where each ADR home is the root directory for all incident data for a particular instance of an Oracle WebLogic Server. The following path shows the location of the ADR home:

ADR_BASE/diag/ofm/domain_name/server_name

Figure 12-1 illustrates the directory hierarchy of the ADR home for an Oracle WebLogic Server instance.

Figure 12-1 ADR Directory Structure for Oracle Fusion Middleware

Description of Figure 12-1 follows
Description of "Figure 12-1 ADR Directory Structure for Oracle Fusion Middleware"

The subdirectories in the ADR home contain the following information:

  • alert: The XML-formatted alert log.

  • incident: A directory that can contain multiple subdirectories, where each subdirectory is named for a particular incident. The subdirectories are named incdir_n, with n representing the number of the incident. Each subdirectory contains information and diagnostic dumps pertaining only to that incident.

  • (others): Other subdirectories of ADR home, which store incident packages and other information.

Note:

ADR uses the domain name as the Product ID and the server name as the Instance ID when it packages an incident. However, if either name is more than 30 characters, ADR truncates the name. In addition, dollar sign ($) and space characters are replaced with underscores.

12.1.2.2 Diagnostic Dumps

A diagnostic dump captures and dumps specific diagnostic information when an incident is created (automatic) or on the request of an administrator (manual). When executed as part of incident creation, the dump is included with the set of incident diagnostics data. Examples of diagnostic dumps include a JVM thread dump, JVM class histogram dump, and DMS metric dump.

12.1.2.3 Management MBeans

The Diagnostic Framework provides MBeans that you can use to configure the Diagnostic Framework. For example, you can configure when flood control is enabled and how many incidents with the same problem key can occur within a specified time period. For information about using the management MBeans to configure the Diagnostic Framework, see Section 12.3.

You can also use the MBeans to query and create incidents, discover the list of available diagnostic dump types, and execute individual diagnostic dumps.

12.1.2.4 WLST Commands for Diagnostic Framework

The Diagnostic Framework provides WLST commands that you can use to view information about problems and incidents, create incidents, execute specific dumps and query the set of diagnostic dump types. For more information, see:

12.1.2.5 ADCRI Command-Line Utility

The ADR Command Interpreter (ADRCI) is a utility that enables you to investigate problems, and package and upload first-failure diagnostic data to Oracle Support, all within a command-line environment. ADRCI also enables you to view the names of the dump files in the ADR, and to view the alert log with XML tags stripped, with and without content filtering.

ADRCI is installed in the following directory:

(UNIX) MW_HOME/wlserver_10.3/server/adr
(Windows) MW_HOME\wlserver_10.3\server\adr

See the following sections for information about using the ADCRI command-line utility:

See Also:

  • The chapter "ADRCI: ADR Command Interpreter" in the Oracle Database Utilities

  • The chapter "Managing Diagnostic Data" in the Oracle Database Administrator's Guide

Both manuals are located at:

http://www.oracle.com/technology/documentation/database.html

12.2 How the Diagnostic Framework Works

The Diagnostic Framework is active in each server and provides automatic error detection through predefined configured rules. Oracle Fusion Middleware components and applications automatically benefit from this always-on checking.

Incidents are automatically detected in two ways:

  • By the incident detection log filter, which is automatically configured to detect critical errors.

  • By the WLDF Watch and Notification component. The Diagnostics Framework listens for a predefined notification type and creates incidents when it receives such notifications.

    For information about configuring WLDF Watch and Notification, see Section 12.3.2.

  • Programmatic incident creation. Some components create incidents directly.

Figure 12-2 shows the interaction when the incident is detected by the incident log detector. It shows the interaction among the incident log detector, the WLDF Diagnostic Image MBean, ADR, and component or application dumps when an incident is detected by the incident log detector.

Figure 12-2 Incident Creation Generated by Incident Log Detector

Description of Figure 12-2 follows
Description of "Figure 12-2 Incident Creation Generated by Incident Log Detector"

The steps represented in Figure 12-2 are:

  1. The incident detection log filter is initialized with component and application diagnostic rules.

  2. An application or component (in this case Oracle WebCenter) logs a message using the java.util.logging API.

  3. The ODL log handler passes the message to the incident detection log filter.

  4. The incident log detection filter inspects the log message to see if an incident should be created, basing its decision on the diagnostic rules for the component. If the diagnostic rule indicates that an incident should be created, it creates an incident in the ADR.

  5. The ODL log handler writes the log message to the log file, and returns control back to WebCenter.

    When an incident is created, a message, similar to the following, is written to the log file:

    [2009-09-16T06:37:59.264-07:00] [dfw] [NOTIFICATION] [DFW-40104] [oracle.dfw]
    [tid: 10] [ecid: 0000IF34gtMC8xT6uBf9EH1AgEck000000,0] [errid: 6] 
    [detailLoc: /middleware/user_projects/base_domain/servers/AdminServer/adr/diag/ofm/base_domain/AdminServer] 
    [probKey: MDS-123456 [testComponent][testModule]] incident 6 created with
     problem key "MDS-123456 [testComponent][testModule]", in directory
     /middleware/user_projects/base_domain/servers/AdminServer/adr/diag/ofm/base_domain/AdminServer/incident/incdir_6
    
  6. The Diagnostic Framework executes the diagnostic dumps that are indicated by the diagnostic rules for the component.

  7. The Diagnostic Framework writes the dumps to ADR, in the directory created for the incident.

  8. The Diagnostic Framework invokes the WLDF Diagnostic Image MBean requesting that a Diagnostic Image be created in ADR.

  9. WLDF writes the Diagnostic Image to ADR.

Figure 12-3 shows the interaction when an incident is detected by the WLDF Watch and Notification system. It shows the interaction among the incident notification listener, the WLDF Watch and Notification system, and the WLDF Diagnostic Image MBean.

Figure 12-3 Incident Creation Generated by WLDF Watch Notification

Description of Figure 12-3 follows
Description of "Figure 12-3 Incident Creation Generated by WLDF Watch Notification"

The steps represented in Figure 12-3 are:

  1. The incident notification listener is initialized with component and application diagnostic rules.

  2. Oracle Fusion Middleware Diagnostic Framework registers a JMX notification listener with WLDF. The listener listens for events from the WLDF Watch and Notification system. It only processes notifications of type oracle.dfw.wldfnotification.

  3. Something in the system causes the configured WLDF watch to be triggered, causing a notification to be sent to the incident notification listener. The notification includes event information describing the data that caused the watch to trigger.

  4. The Diagnostic Framework creates an incident in ADR.

  5. The Diagnostic Framework executes the diagnostic dumps that are indicated by the diagnostic rules.

  6. The Diagnostic Framework writes the dumps to ADR, in the directory created for the incident.

  7. The Diagnostic Framework invokes the WLDF Diagnostic Image MBean requesting that a Diagnostic Image be created in ADR.

  8. WLDF writes the Diagnostic Image to ADR.

12.3 Configuring the Diagnostic Framework

You can configure some settings for the Diagnostic Framework. In addition, you can configure an WLDF Watch and Notification to create an incident. The following topics describe how to configure the Diagnostic Framework:

12.3.1 Configuring Diagnostic Framework Settings

You can configure the following settings:

  • Enabling or disabling the detection of incidents through the log files

  • Enabling or disabling flood control and setting parameters for flood control

  • Enabling or disabling incident purging

You configure these settings by using the Diagnostic Framework MBean DiagnosticConfig. The following shows the MBean's ObjectName:

oracle.dfw:type=oracle.dfw.jmx.DiagnosticsConfigMBean,name=DiagnosticsConfig 

Table 12-1 shows the parameters for the DiagnosticConfig MBean and a description of each parameter.

Table 12-1 DiagnosticConfig MBean Parameters for Diagnostic Framework

Parameter Description

incidentPurgeEnabled

Enables or disables the purging of incidents. Specify true for enabled or false for disabled. The default is true.

logDetectionEnabled

Enables or disables the detection of incidents through the log files. Specify true for enabled or false for disabled. The default is true.

floodControlEnabled

Enables or disables flood control. Specify true for enabled or false for disabled. The default is true.

Note that flood control does not apply to manually created incidents.

floodControlIncidentCount

Sets the number of incidents with the same problem key that can occur within the time period, specified by floodControlIncidentTimeoutPeriod, before they are controlled by flood control. The default is 3.

When flood control is enabled, if the number of incidents with the same problem key exceeds this count, an incident is created, but no diagnostics are captured with the incident.

floodControlIncidentTimeoutPeriod

Sets the time period in which the number of incidents, as specified by floodControlIncidentCount, with the same Problem Key can occur before they are controlled by flood control. The default is 15 minutes.

reservedMemoryKB

The amount of reserved memory that is released when OutOfMemoryError is detected.

When the Diagnostic Framework starts, it allocates 512KB of memory for its own private use. When the Diagnostic Framework detects that an OutOfMemoryError has occurred in the server, it frees that block of memory and proceeds to create the incident.

The default is 512KB.

useExternalCommands

Indicates whether external JVM commands should be used to perform thread dumps. Specify true for enabled or false for disabled. The default is true.


The following example shows how to configure these settings using the Fusion Middleware Control System MBean Browser:

  1. From the target navigation pane, expand the farm, then WebLogic Domain.

  2. Select the domain.

  3. From the WebLogic Domain menu, choose System MBean Browser.

    The System MBean Browser page is displayed.

  4. Expand Application Defined Beans, then oracle.dfw, then domain.domain_name, then dfw.jmx.DiagnosticsConfigMBean.

  5. Select one of the DiagnosticConfig entries. There is one DiagnosticConfig entry for each server.

    In the Application Defined MBean pane, expand Show MBean Information to see the server name.

    The following shows the System MBean Browser page:

    Description of dfw_config.gif follows
    Description of the illustration dfw_config.gif

  6. To change the values for the attributes listed in Table 12-1, select the attribute.

  7. Enter or select the value in the Value field.

  8. Click Apply.

12.3.2 Configuring WLDF Watch and Notification for the Diagnostic Framework

Fusion Middleware configures a WLDF Diagnostics Module that contains a set of Watch and Notification rules for detecting a specific set of critical errors and creating an incident for each occurrence of those errors. The module is called Module-FMWDFW and contains the following set of Watch conditions:

Name Description
Deadlock Two or more Java threads have circular lock chains among their Java Monitor object usage.
StuckThread An Oracle WebLogic Server ExecuteThread, which is blocked or busy for more than the time specified by the Oracle WebLogic Server StuckThreadMaxTime parameter.
UncheckedException This category includes all Unchecked Exception, RuntimeException, and Errors caught by the Oracle WebLogic Server ExecuteThread, such as NullPointerException, StackOverflowError, or OutOfMemoryError.

The Diagnostic Module also includes a configured WLDF JMX Notification FMWDFW-notification of type oracle.dfw.wldfnotification. You can reuse this WLDF JMX Notification for your own WLDF Watch conditions in order to create an incident:

  1. Display the Administration Console, as described in Section 3.4.1.

  2. In the Change Center, click Lock & Edit.

  3. In the left pane, expand Diagnostics and select Diagnostic Modules.

    The Summary of Diagnostic Modules page is displayed.

  4. Click Module-FMWDFW.

    The Settings for Module-FMWDFW page is displayed.

  5. Select the Watches and Notifications tab, which is shown in the following figure:

    Description of dfw_notif.gif follows
    Description of the illustration dfw_notif.gif

  6. Select the Watches tab and click New.

    The Create Watch page is displayed.

  7. For Name, enter a name for the watch.

  8. For Watch Type, select a type.

  9. Click Next.

  10. For Current Watch Rule, construct an expression. For example, (SEVERITY = 'Error') AND (MSGID = 'BEA-000337').

  11. Click Next.

  12. Select an alarm type.

  13. For Notifications, select FMWDFW-notification.

  14. Click Finish.

For more information on creating watches, see "Construct watch rule expressions" in the Administration Console Online Help.

12.4 Investigating, Reporting, and Solving a Problem

This section describes how to use WLST and ADRCI commands to investigate and report a problem (critical error), and in some cases, resolve the problem. The section begins with a roadmap that summarizes the typical set of tasks that you must perform. It describes the following topics:

12.4.1 Roadmap—Investigating, Reporting, and Resolving a Problem

Typically, investigating, reporting, and resolving a problem begins with a critical error. This section provides an overview of that workflow.

Figure 12-4 illustrates the tasks that you complete to investigate, report, and in some cases, resolve a problem.

Figure 12-4 Flow for Investigating a Problem

Description of Figure 12-4 follows
Description of "Figure 12-4 Flow for Investigating a Problem"

The following describes the workflow illustrated in Figure 12-4:

  1. You notice that the system, component, or application is not functioning as expected. For example, you notice that there is a performance problem or users have reported that the application that they are trying to access is reporting errors.

  2. Check to see if a problem and an incident have been created that may be related to the symptoms you are observing:

    1. View the set of problems by using the WLST listProblems command, as described in Section 12.4.2.1.

    2. If a problem has been created, list the incidents related to the specific problem using the listIncidents command, as described in Section 12.4.2.2.

  3. If an incident has not been created, go to Step 4. If an incident has been created, go to Step 5.

  4. If you do not see any incidents listed that are related to your problem, you can create an incident manually using the createIncident command to capture diagnostics for the problem.

    Consider creating an incident when you encounter an issue, such as software failure or performance problem, and you want to gather more diagnostic data. You can view the log files and the messages in the files. If there is a specific message that you believe is related to the issue you are seeing, you can use the message ID in the createIncident command

    See Section 12.4.4.1 for more information about creating an incident.

  5. View the details of the specific incident using the showIncident command, as described in Section 12.4.2.2. This command lists information about the incident, including the related message id, the time of the incident, the ECID, and the files generated by the incident.

  6. Use the getIncidentFile command to view the contents of files for the incident, as described in Section 12.4.2.2. The contents may provide information to guide you to the source of the problem and help in resolving it.

  7. If the contents of the files for the incident do not help you to resolve the problem, you can execute additional dumps to view detailed diagnostics. For example, if you are experiencing performance problems, execute the dms.metrics dump. See Section 12.4.3 for information about the dumps available and how to execute them.

  8. If you still cannot resolve the problem, package the incident and send it to Oracle Support. See Section 12.4.4.2 for more information.

12.4.2 Viewing Problems and Incidents

You can view the set of problems, the list of incidents and view the details of a particular incident using the WLST command-line utility, as described in the following topics:

12.4.2.1 Viewing Problems

You can view the set of problems by using the WLST listProblems command, using the following format:

listProblems([adrHome] [,server])

The listProblems command lists the problems in the ADR home. Each problem has a unique ID:

listProblems()
Problem Id      Problem Key
        1       BEA-101020 [HTTP]

12.4.2.2 Viewing Incidents

You can view the list of incidents and view the details of a particular incident using the WLST command-line utility.

To view the list of all available incidents or the incidents related to a specific problem, use the WLST listIncidents command, using the following format:

listIncidents([id], [ADRHome])

For example, to see the list of all incidents, use the following command:

listIncidents()
Incident Id     Problem Key              Incident Time
        2       BEA-101020 [HTTP]        Fri Sep 18 13:42:01 PDT 2009
        1       BEA-101020 [HTTP]        Tue Sep 15 06:17:39 PDT 2009

To view the incidents related to a specific problem, use the following command:

listIncidents(id='1')
Incident Id     Problem Key              Incident Time
        2       BEA-101020 [HTTP]        Fri Sep 18 13:42:01 PDT 2009
        1       BEA-101020 [HTTP]        Tue Sep 15 06:17:39 PDT 2009

To view the details of a particular incident, use the WLST showIncident command, using the following format:

showIncident(id, [adrHome] [,server])

For example, to see the details of incident 1, use the following command:

showIncident(id='1')
Incident Id: 1
Problem Id: 1
Problem Key: BEA-101020 [HTTP]
Incident Time: Tue Sep 15 06:17:39 PDT 2009
Error Message Id:  BEA-101020
Execution Context: 0000IExqUvyAhKB5JZ4Eyf1Afdj600009i
Flood Controlled: false
Dump Files :
    dms_ecidctx1_i1.dmp
    jvm_threads2_i1.dmp
    dms_metrics3_i1.dmp
    odl_logs4_i1.dmp
    odl_logs5_i1.dmp
    diagnostic_image_AdminServer_2009_09_15_06_17_42.zip
    readme.txt

To view the contents of a file in the incident, use the WLST getIncidentFile command, using the following format:

getIncidentFile(id, name [,outputFile] [,adrHome] [,server])

For example, to view the contents for the file odl_logs4_i1.dmp use the following command:

getIncidentFile(id='1', name='odl_logs4_i1.dmp', outputFile='/tmp/odl_logs4_i1_dmp.output')

The output is written to the file odl_logs4_i1_dmp.output.

12.4.3 Working with Diagnostic Dumps

If you suspect a problem, you can make use of the built-in diagnostic dumps to report detailed diagnostics that can help diagnose the problem. Diagnostic dumps provide a means to output and record diagnostics data which serve as valuable information when diagnosing issues with Oracle Fusion Middleware components, applications, and infrastructure. The output from these dumps is intended to be used by customers and Oracle Support to diagnose issues with Oracle Fusion Middleware.

Diagnostic dumps are executed in the following ways:

  • Manually, using WLST commands, as described in the following sections

    For example, if your Java EE application is hanging and you suspect a deadlock, you could use the jvm.threads dump to obtain the set of threads.

  • Automatically, when the Diagnostic Framework detects a critical error and creates an incident or when the administrator creates an incident

12.4.3.1 Listing Diagnostic Dumps

You can find a list of diagnostic dumps that are available for a Managed Server by specifying the WLST listDumps command, using the following format:

listDumps([appName] [,server])

For example, to list the available dumps for soa_server1:

listDumps(server='soa_server1')
Location changed to domainRuntime tree. This is a read-only tree with DomainMBean as the root. 
For more help, use help(domainRuntime)
 
jvm.classhistogram
dms.ecidctx
wls.image
odl.logs
dms.metrics
jvm.threads
 
Use the command describeDump(name=<dumpName>) for help on a specific dump.

Table 12-2 lists the diagnostic dump actions that are defined by Oracle Fusion Middleware and their descriptions.

Table 12-2 Diagnostic Dump Actions

Dump Action Description

dms.ecidctx

The data associated with a specific Execution Context ID (ECID), if specified. Otherwise, the data associated with all available ECIDs.

dms.metrics

Dynamic Monitoring Service (DMS) metrics. For information about these metrics, see "DMS Internal Metrics" in the Oracle Fusion Middleware Performance Guide.

jvm.classhistogram

A JVM class histogram, the output of which varies depending on the JVM vendor.

jvm.threads

Summary statistics about the threads running in a JVM as well as performing a full thread dump.

odl.logs

Contents of diagnostic logs, correlated by ECID or time range.

wls.image

The WLDF server image dump.


12.4.3.2 Viewing a Description of a Diagnostic Dump

You can view a description of a particular dump, including the syntax for executing the dump by using the WLST describeDump command. You specify the name of the dump in which you are interested. For example, to view a description of the dms.metrics dump, use the following command:

describeDump(name='dms.metrics')
Name: dms.metrics
Description: Dumps DMS (Dynamic Monitoring Service) metrics.
Mandatory Arguments: 
Optional Arguments:
    Name        Type     Description
    format      STRING   Format of the dump output; raw or xml
wls:/soa_domain/serverConfig> 

12.4.3.3 Executing Dumps

If you detect a problem and want to gather additional diagnostic data, you can invoke the executeDump command for a specified dump. Each dump may have mandatory or optional arguments, or both. To view the arguments for a particular dump and how to specify them, use the describeDump command, as described in Section 12.4.3.2.

The following example executes the dump with the name dms.metrics and the incident ID 1 and writes it to the file dumpout.txt:

executeDump(name='dms.metrics', outputFile='/tmp/dumpout.txt', id='1')
Dump file dms_metrics1_i1.dmp added to incident 1

The dump output is written to the information about incident 1. If you execute the showIncident command for incident 1, the output includes dms_metrics1_i1.dmp.

12.4.4 Managing Incidents

The Diagnostic Framework stores incidents, whether they are created automatically or manually, and Oracle Fusion Middleware provides tools to help you process incident reports and to package those incidents to send to Oracle Support. The following sections describe:

12.4.4.1 Creating an Incident Manually

System-generated problems—critical errors generated internally—are automatically added to the Automatic Diagnostic Repository (ADR). You can gather additional diagnostic data on these problems, upload diagnostic data to Oracle Support, and in some cases, resolve the problems, all with the workflow that is explained in Section 12.4.

Consider creating an incident manually when you encounter an issue, such as software failure or performance problem and you want to gather more diagnostic data, but the Diagnostic Framework has not automatically created an incident.

You use the WLST command createIncident to create an incident manually. You can specify an incident based on time, a message ID, an impact area, or an ECID. Then, you can inspect the content of the incident or send it to Oracle Support for further analysis.

The following describes how to manually create an incident based on a message ID:

  1. Search the log files, as described in Section 11.3.1. If you find a message that you suspect is related to the issue you are seeing, you can use the message ID when you create the incident.

  2. Use the following commands to invoke WLST, connect to the Managed Server and navigate to the Managed Server instance:

    java weblogic.WLST
    connect('weblogic', 'password', 'localhost:7001')
    cd('servers/server_name')
    
  3. Create the incident, using the createIncident command, with the following format:

    createIncident([adrHome] [,incidentTime] [,messageId] [,ecid][,appName]
      [,description] [,server])
    

    For example, to create an incident based on the error with the message ID MDS-50500, use the following command, specifying the message ID, and provide a description of the incident to help you and Oracle support track the incident:

    createIncident(messageId='MDS-50500', description='sample incident')
    Incident Id: 55
    Problem Id: 4
    Problem Key: MDS-50500 [MANUAL]
    Incident Time: 25th June 2009 11:55:45 GMT
    Error Message Id: MDS-50500
    Flood Controlled: false
    

    If you do not specify a server, the incident collects information from the server to which you are connected. To specify a server, use the server option, as shown in the following example:

    createIncident(messageId='MDS-50500', description='sample incident', server='soa_server1')
    )
    

    If you do not specify the adrHome option, the incident is created in the server to which you are connected. For example, if you are connected to the Administration Server, the incident is created in the adrHome for the Administration Server.

    The Diagnostic Framework evaluates the command and invokes the appropriate diagnostic dumps. The incident and the diagnostic dumps are written to the ADR. Each diagnostic dump writes its output to the incident.

    You can view the information about the incident, as described in Section 12.4.2.2.

    You can view the information in the dumps, as described in Section 12.4.3.

12.4.4.2 Packaging an Incident

You can package the incident to facilitate sending the information to Oracle Support by using the ADR Command Interpreter (ADRCI). ADRCI utility enables you to investigate and report problems in a command-line environment. With ADRCI, you can package incident and problem information into a zip file for transmission to Oracle Support.

The ADRCI command-line utility is located in the following directory:

(UNIX) MW_HOME/wlserver_10.3/server/adr
(Windows) MW_HOME\wlserver_10.3\server\adr

Packaging an incidents involves a three-step process:

  1. Create a logical package.

    The package is denoted as logical because it exists only as metadata in the ADR. It has no content until you generate a physical package from the logical package. The logical package is assigned a package number, and you refer to it by that number in subsequent commands.

    You can create the logical package as an empty package, or as a package based on an incident number, a problem number, a problem key, or a time interval. If you create the package as an empty package, you can add diagnostic information to it in step 2.

    Creating a package based on an incident means including diagnostic data, such as dumps, for that incident. Creating a package based on a problem number or problem key means including in the package diagnostic data for incidents that reference that problem number or problem key. Creating a package based on a time interval means including diagnostic data on incidents that occurred in the time interval.

  2. Add diagnostic information to the package.

    If you created a logical package based on an incident number, a problem number, a problem key, or a time interval, this step is optional. You can add additional incidents to the package or you can add any file within the ADR to the package. If you created an empty package, you must use ADRCI commands to add incidents or files to the package.

  3. Generate the physical package.

    When you submit the command to generate the physical package, ADRCI gathers all required diagnostic files and adds them to a zip file in a designated directory. You can generate a complete zip file or an incremental zip file. An incremental file contains all the diagnostic files that were added or changed since the last zip file was created for the same logical package. You can create incremental files only after you create a complete file, and you can create as many incremental files as you want. Each zip file is assigned a sequence number so that the files can be analyzed in the correct order.

    Zip files are named according to the following format:

    packageName_mode_sequence.zip
    

    In the format:

    • packageName consists of a portion of the problem key followed by a timestamp.

    • mode is either COM or INC, for complete or incremental.

    • sequence is an integer.

    For example, to package an incident, take the following steps:

    1. Set the ORACLE_HOME and LD_LIBRARY_PATH environment variables to point to the following directory:

      MW_HOME/wlserver_10.3/server/adr
      
    2. Invoke ADRCI. For example:

      MW_HOME/wlserver_10.3/server/adr/adrci
      
    3. Use the SET BASE command to specify the ADR Base and the SET HOMEPATH command to specify the ADR home that contains the incident. The path for the HOMEPATH is relative to the ADR Base.

      SET BASE /scratch/oracle1/Oracle/Middleware/user_projects/domains/soa_domain/servers/soa_server1/adr
      SET HOMEPATH diag/ofm/soa_domain/soa_server1
      
    4. Generate the logical package:

      IPS CREATE PACKAGE INCIDENT incident_number
      

      For example, the following command creates a package based on incident 1:

      IPS CREATE PACKAGE INCIDENT 1
      Created package 1 based on incident id 1, correlation level typical
      

      ADRCI assigns the logical package a number.

    5. Optionally, you can add diagnostic information to the logical package. You can add the following types of information:

      • All diagnostic information for a particular incident. For example, you can add another incident that you think might be related to the incident you are packaging, using the following command:

        IPS ADD INCIDENT incident_number PACKAGE package_number
        
      • A named file within the ADR. For example, if an incident is related to an application, you can add the .ear file for the application. You can also add a readme file with notes you provide to Oracle Support. For example, to add a file to the package, use the following command:

        IPS ADD FILE filespec PACKAGE package_number
        
    6. Generate the physical package using the following command:

      IPS GENERATE PACKAGE package_number IN path
      

      For example, to generate a package with the number 1, use the following command:

      IPS GENERATE PACKAGE 1 in /tmp 
      Generated package 1 in file /tmp/BEA337Web_20090925132315_COM_1.zip, mode complete
      

      This generates a complete physical package (zip file) in the designated path.

See Also:

The "ADRCI: ADR Command Interpreter" chapter of the Oracle Database Utilities, which is located at:
http://www.oracle.com/technology/documentation/database.html

12.4.4.3 Purging Incidents

By default, incidents are automatically purged (deleted) if they are older than 30 days.

You can manually purge incidents using the ADRCI command. You can purge based on an ID or range of IDs, the age of the incident, or the type of incident. For example, to purge incidents that are older than 60 minutes, use the following command:

purge -age 60 

See the "ADRCI: ADR Command Interpreter" chapter of the Oracle Database Utilities, which is located at:

http://www.oracle.com/technology/documentation/database.html

You can set the purge frequency, as described in Section 12.3.