7 Resolving Transient Performance Problems

Transient performance problems are short-lived and do not appear in the Automatic Database Diagnostics Monitor (ADDM) analysis. ADDM tries to report the most significant performance problems during an analysis period in terms of their impact on DB time. If a particular problem lasts for a very short duration, its severity might be averaged out or minimized by other performance problems in the entire analysis period; therefore, the problem may not appear in the ADDM findings. Whether or not a performance problem is captured by ADDM depends on its duration compared to the interval between the Automatic Workload Repository (AWR) snapshots.

If a performance problem lasts for a significant portion of the time between snapshots, it will be captured by ADDM. For example, if the snapshot interval is set to one hour, a performance problem that lasts for 30 minutes should not be considered as a transient performance problem because its duration represents a significant portion of the snapshot interval and will likely be captured by ADDM.

On the other hand, a performance problem that lasts for only 2 minutes could be a transient performance problem because its duration represents a small portion of the snapshot interval and will likely not show up in the ADDM findings. For example, if the user notifies you that the system was slow between 10:00 p.m. and 10:10 p.m., but the ADDM analysis for the time period between 10:00 p.m. and 11:00 p.m. does not show a performance problem, it is likely that a transient performance problem occurred that lasted for only a few minutes of the 10-minute interval reported by the user. This chapter describes how to resolve these types of transient performance problems with Oracle Database.

This chapter contains the following sections:

Overview of Active Session History

In order to capture a detailed history of database activity, Oracle Database samples active sessions each second using the Active Session History (ASH) sampler. The sampled data is collected into memory and written to persistent storage by the Automatic Workload Repository (AWR) snapshot processing. ASH is an integral part of the Oracle Database self-management framework and is extremely useful for diagnosing performance problems.

Unlike the instance-level statistics gathered by the AWR, sampled data is gathered at the session level by ASH. By capturing statistics for only active sessions, a manageable set of data is represented, with the size being directly related to the work being performed rather than the entire instance.

Sampled data captured by ASH can be rolled up based on the various dimensions that it captures, including:

  • SQL identifier of a SQL statement

  • Object number, file number, and block number

  • Wait event identifier and parameters

  • Session identifier and session serial number

  • Module and action name

  • Client identifier of the session

  • Service hash identifier

You can run Active Session History (ASH) reports to analyze transient performance problems with the database that only occur during specific times. This is especially useful when trying to:

  • Resolve transient performance problems that may last for only a short period of time, such as why a particular job or session is not responding when the rest of the instance is performing normally

  • Perform scoped or targeted performance analysis by various dimensions or their combinations, such as time, session, module, action, or SQL identifier

Running Active Session History Reports

This section describes how to run ASH reports.

To run ASH reports:

  1. On the Database Performance page, under Average Active Sessions, click Run ASH Report.

    The Run ASH Report page appears.

  2. Enter the start date and time, and the end date and time, of the time period when the transient performance problem occurred.

    In this example, database activity spiked between 10:00 p.m. and 10:10 p.m., so an ASH report needs to be created for that time period.

    Description of run_ash.gif follows
    Description of the illustration run_ash.gif

  3. Click Generate Report.

    The Processing: View Report page appears while the report is being generated.

  4. After the report is generated, the ASH report appears under Report Results on the Run ASH Report page, as shown in Figure 7-1. To save the report in HTML for future analysis, click Save to File.

    Figure 7-1 ASH Report Results

    Description of Figure 7-1 follows
    Description of "Figure 7-1 ASH Report Results"

Using Active Session History Reports

To use an ASH report to analyze transient performance problems, you need to review the contents of the ASH report to identify the problem.

The contents of the ASH report are divided into the following sections:

Top Events

The Top Events section describes the top wait events of the sampled session activity categorized by user, background, and priority. Use the information in this section to identify the wait events that may be the cause of the transient performance problem.

The Top Events section contains the following subsections:

Top User Events

The Top User Events subsection lists the top wait events from user processes that accounted for the highest percentages of sampled session activity.

The example in Figure 7-2 shows that a high percentage of database activity (73 percent) is consumed by the CPU + Wait for CPU event. In this example, the Load Profiles section should be examined next to determine the type of activity that is causing this wait event.

Figure 7-2 Top User Events

Description of Figure 7-2 follows
Description of "Figure 7-2 Top User Events"

Top Background Events

The Top Background Events subsection lists the top wait events from backgrounds that accounted for the highest percentages of sampled session activity.

Top Event P1/P2/P3 Values

The Top Event P1/P2/P3 subsection lists the wait event parameter values of the top wait events that accounted for the highest percentages of sampled session activity, ordered by the percentage of total wait time (% Event). For each wait event, values in the P1 Value, P2 Value, P3 Value column correspond to wait event parameters displayed in the Parameter 1, Parameter 2, and Parameter 3 columns.

For example, in Figure 7-3, db file sequential read is the top wait event, consuming 13.26 percent of total wait time while the session waits for a sequential read from the database to be performed.

Figure 7-3 Top Event P1/P2/P3 Values

Description of Figure 7-3 follows
Description of "Figure 7-3 Top Event P1/P2/P3 Values"

The wait event has the following parameters: file# (P1), block# (P2), and blocks (P3). The corresponding values for these wait event parameters are displayed in the P1 Value, P2 Value, P3 Value column:

  • file# = "1"

  • block# = "1801"

  • block = "1"

Using the parameter values in this example, it can be determined that 1801 is the block number of the single block for which the session is waiting.

See Also:

Load Profiles

The Load Profile section describes the load analyzed in the sampled session activity. Use the information in this section to identify the service, client, or SQL command type that may be the cause of the transient performance problem.

The Load Profile section contains the following subsections:

Top Service/Module

The Top Service/Module subsection lists the services and modules that accounted for the highest percentages of sampled session activity.

The example in Figure 7-4 shows that the majority of database activity (65 percent) is consumed by the SYS$USERS service running the SQL*Plus module. In this example, it appears that the user is running a high-load SQL statement that is causing the performance problem. The Top SQL Command Types subsection should be analyzed next to determine if a particular type of SQL statement makes up the load.

Figure 7-4 Top Service/Module

Description of Figure 7-4 follows
Description of "Figure 7-4 Top Service/Module"

See Also:

Top Client IDs

The Top Client IDs subsection lists the clients that accounted for the highest percentages of sampled session activity based on their client ID, which is the application-specific identifier of the database session.

Top SQL Command Types

The Top SQL Command Types subsection lists the SQL command types (such as SELECT, UPDATE, INSERT, and DELETE) that accounted for the highest percentages of sampled session activity.

The example in Figure 7-5 shows that the majority of database activity (68 percent) is used by the SQL command type SELECT. From this information, it appears that SQL statements causing the performance problem are SELECT statements. The Top SQL section should be analyzed next to identify the high-load SQL statement.

Figure 7-5 Top SQL Command Types

Description of Figure 7-5 follows
Description of "Figure 7-5 Top SQL Command Types"

Top SQL

The Top SQL section describes the top SQL statements of the sampled session activity. Use this information to identify high-load SQL statements that may be the cause of the transient performance problem.

The Top SQL section contains the following subsections:

Top SQL Statements

The Top SQL Statements subsection lists the SQL statements that accounted for the highest percentages of sampled session activity. To view the text of the Top SQL statements displayed, click the link in the SQL ID column of the SQL statement that you want to view.

The example in Figure 7-6 shows that the majority of database activity (65 percent) is used by a particular SQL SELECT statement. Based on this information, it appears that this is the high-load SQL statement causing the performance problem. The Top Sessions section should be analyzed next to identify the session running this SQL statement.

Figure 7-6 Top SQL Statements

Description of Figure 7-6 follows
Description of "Figure 7-6 Top SQL Statements"

Top SQL Using Literals

The Top SQL Statements subsection lists the SQL statements using literals that accounted for the highest percentages of sampled session activity.

Complete List of SQL Text

The Complete List of SQL Text subsection displays the entire text of the Top SQL statements shown in this section.

Top Sessions

The Top Sessions section describes the sessions that were waiting for a particular wait event. Use this information to identify the sessions that accounted for the highest percentages of sampled session activity, which may be the cause of the transient performance problem.

The Top Sessions section contains the following subsections:

Top Sessions

The Top Session subsection lists the sessions that were waiting for a particular wait event that accounted for the highest percentages of sampled session activity.

The example in Figure 7-7 shows that the majority of database activity (65 percent) is used by the user ICHAN with the session ID of 135. From this information, it appears that this is the user running the high-load SQL statement identified earlier. You should investigate this session to determine whether it is performing a legitimate operation and tune the SQL statement if possible. If tuning the SQL statement is not possible and the session is causing an unacceptable performance impact on the system, you may want to consider terminating the session.

See Also:

Top Blocking Sessions

The Top Blocking Sessions subsection lists the blocking sessions that accounted for the highest percentages of sampled session activity.

Top Sessions Running PQs

The Top Sessions Running PQs subsection lists the sessions running parallel queries (PQs) that were waiting for a particular wait event which accounted for the highest percentages of sampled session activity.

Top Objects/Files/Latches

The Top Objects/Files/Latches section provides additional information about the most commonly-used database resources and contains the following subsections:

Top DB Objects

The Top DB Objects subsection lists the database objects (such as tables and indexes) that accounted for the highest percentages of sampled session activity.

Top DB Files

The Top DB Files subsection lists the database files that accounted for the highest percentages of sampled session activity.

Top Latches

The Top Latches subsection lists the latches that accounted for the highest percentages of sampled session activity.

Latches are simple, low-level serialization mechanisms to protect shared data structures in the system global area (SGA). For example, latches protect the list of users currently accessing the database and the data structures describing the blocks in the buffer cache. A server or background process acquires a latch for a very short time while manipulating or looking at one of these structures. The implementation of latches is operating system dependent, particularly in regard to whether and how long a process will wait for a latch.

See Also:

Activity Over Time

The Activity Over Time section is one of the most informative sections of the ASH report. This section is particularly useful for longer time periods because it provides in-depth details about activities and workload profiles during the analysis period. The Activity Over Time section is divided into multiple time slots, as shown in Figure 7-8.

Figure 7-8 Activity Over Time

Description of Figure 7-8 follows
Description of "Figure 7-8 Activity Over Time"

Each of the time slots contains information regarding that particular time slot, as described in Table 7-1.

Table 7-1 Activity Over Time

Column Description

Slot Time (Duration)

Duration of the slot

Slot Count

Number of sampled sessions in the slot

Event

Top three wait events in the slot

Event Count

Number of ASH samples waiting for the wait event

% Event

Percentage of ASH samples waiting for wait events in the entire analysis period


The first and last slots are usually odd-sized. All inner slots and are 1 minute each, and can be compared to each other.

When comparing the inner slots, perform a skew analysis by identifying spikes in the Event Count and Slot Count columns. A spike in the Event Count column indicates an increase in the number of sampled sessions waiting for a particular event. A spike in the Slot Count column indicates an increase in active sessions, because ASH data is sampled from active sessions only and a relative increase in database workload. Typically, when the number of active session samples and the number of sessions associated with a wait event increases, the slot may be the cause of the transient performance problem.