Oracle Solaris Modular Debugger Guide

Chapter 9 Debugging With the Kernel Memory Allocator

The Solaris kernel memory (kmem) allocator provides a powerful set of debugging features that can facilitate analysis of a kernel crash dump. This chapter discusses these debugging features, and the MDB dcmds and walkers designed specifically for the allocator. Bonwick (see Related Books and Papers) provides an overview of the principles of the allocator itself. Refer to the header file <sys/kmem_impl.h> for the definitions of allocator data structures. The kmem debugging features can be enabled on a production system to enhance problem analysis, or on development systems to aid in debugging kernel software and device drivers.


Note –

MDB exposes kernel implementation details that are subject to change at any time. This guide reflects the Solaris kernel implementation as of the date of publication of this guide. Information provided in this guide about the kernel memory allocator might not be correct or applicable to past or future Solaris releases.


Getting Started: Creating a Sample Crash Dump

This section shows you how to obtain a sample crash dump, and how to invoke MDB in order to examine it.

Setting kmem_flags

The kernel memory allocator contains many advanced debugging features, but these are not enabled by default because they can cause performance degradation. In order to follow the examples in this guide, you should turn on these features. You should enable these features only on a test system, as they can cause performance degradation or expose latent problems.

The allocator's debugging functionality is controlled by the kmem_flags tunable. To get started, make sure kmem_flags is set properly:

# mdb -k
> kmem_flags/X
kmem_flags:
kmem_flags:     f

If kmem_flags is not set to f, you should add the following line to the /etc/system file:

set kmem_flags=0xf

The reboot the system. When the system reboots, confirm that kmem_flags is set to f. Remember to remove your /etc/system modifications before returning this system to production use.

Forcing a Crash Dump

The next step is to make sure crash dumps are properly configured. First, confirm that dumpadm is configured to save kernel crash dumps and that savecore is enabled. See dumpadm(1M) for more information on crash dump parameters.

# dumpadm
              Dump content: kernel pages
               Dump device: /dev/dsk/c0t0d0s1 (swap)
        Savecore directory: /var/crash/testsystem
          Savecore enabled: yes
           Save compressed: on

Next, reboot the system using the -d flag to reboot(1M), which forces the kernel to panic and save a crash dump.

# reboot -d
Sep 28 17:51:18 testsystem reboot: rebooted by root

panic[cpu0]/thread=70aacde0: forced crash dump initiated at user request

401fbb10 genunix:uadmin+55c (1, 1, 0, 6d700000, 5, 0)
  %l0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
         00000000
...

When the system reboots, savecore runs automatically to preserve the crash dump in a file. When finished, a message is printed on the system console:

Sep 17 10:47:23 testsystem savecore: Decompress the crash dump with
Sep 17 10:47:23 testsystem 'savecore -vf /var/crash/testsystem/vmdump.0'

If the message does not appear right away, check to whether savecore(1M) is still running:


$ pgrep savecore
864
$ cd /var/crash/testsystem
$ ls
bounds    vmdump.0

The vmdump.n file is a compressed version of vmcore.n plus unix.n.

If your dump directory contains no dump files , then that partition might be out of space. You can free up space and run savecore(1M) manually as root to subsequently save the dump.

If your dump directory contains multiple crash dumps, the one you just created is the unix.n and vmcore.n pair or vmdump.n file with the most recent modification time.

Saving a Crash Dump

When the system panics, or when you enter reboot -d, the following kinds of messages appear on the console:

Sep 17 10:47:23 testsystem savecore: Decompress the crash dump with 
Sep 17 10:47:23 testsystem 'savecore -vf /var/crash/testsystem/vmdump.0'

Enter the following command:


root@testsystem # savecore -vf /var/crash/testsystem/vmdump.0
savecore: System dump time: Thu Sep 17 10:43:20 2009

savecore: saving system crash dump in /var/crash/testsystem/{unix,vmcore}.0
Constructing namelist /var/crash/testsystem/unix.0
Constructing corefile /var/crash/testsystem/vmcore.0
 1:29 100% done: 825215 of 825215 pages saved
1:30 dump decompress is done

Now you can use mdb:


root@testsystem# mdb /var/crash/testsystem/{unix,vmcore}.0
Loading modules: [ unix genunix specfs dtrace zfs scsi_vhci sd mpt px mac ldc sockfs
ip hook neti sctp arp usba stmf qlc fctl nca lofs idm logindmux ptm ufs md cpc sppp
random smbsrv nfs crypto mdesc nsctl sdbc sv rdc fcp fcip ii nsmb ]
>

You can copy the vmdump.n file to another system for analysis. You can use savecore(1M) either locally or remotely to uncompress the dump file.

Use the dumpadm(1M) command to control the particular paths of the dump device and the savecore directory.

You can use the file(1) command to quickly examine files in the directory:


$ cd /var/crash/testsystem
$ file *
bounds:         ascii text
unix.0:         ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC3 Extensions 
Required, statically linked, not stripped, no debugging information available
vmcore.0:       SunOS 5.11 Generic 64-bit SPARC crash dump from 'testsystem'
vmdump.0:       SunOS 5.11 Generic 64-bit SPARC compressed crash dump from 'testsystem'

Starting MDB

Now, run mdb on the crash dump you created, and check its status:

$ mdb unix.1 vmcore.1
Loading modules: [ unix krtld genunix ip nfs ipc ]
> ::status
debugging crash dump vmcore.1 (32-bit) from testsystem
operating system: 5.10 Generic (sun4u)
panic message: forced crash dump initiated at user request

In the examples presented in this guide, a crash dump from a 32-bit kernel is used. All of the techniques presented here are applicable to a 64-bit kernel, and care has been taken to distinguish pointers (sized differently on 32- and 64-bit systems) from fixed-sized quantities, which are invariant with respect to the kernel data model.

An UltraSPARC workstation was used to generate the example presented. Your results can vary depending on the architecture and model of system you use.

Allocator Basics

The kernel memory allocator's job is to parcel out regions of virtual memory to other kernel subsystems (these are commonly called clients). This section explains the basics of the allocator's operation and introduces some terms used later in this guide.

Buffer States

The functional domain of the kernel memory allocator is the set of buffers of virtual memory that make up the kernel heap. These buffers are grouped together into sets of uniform size and purpose, known as caches. Each cache contains a set of buffers. Some of these buffers are currently free, which means that they have not yet been allocated to any client of the allocator. The remaining buffers are allocated, which means that a pointer to that buffer has been provided to a client of the allocator. If no client of the allocator holds a pointer to an allocated buffer, this buffer is said to be leaked, because it cannot be freed. Leaked buffers indicate incorrect code that is wasting kernel resources.

Transactions

A kmem transaction is a transition on a buffer between the allocated and free states. The allocator can verify that the state of a buffer is valid as part of each transaction. Additionally, the allocator has facilities for logging transactions for post-mortem examination.

Sleeping and Non-Sleeping Allocations

Unlike the Standard C Library's malloc(3C) function, the kernel memory allocator can block (or sleep), waiting until enough virtual memory is available to satisfy the client's request. This is controlled by the flag parameter to kmem_alloc(9F). A call to kmem_alloc(9F) which has the KM_SLEEP flag set can never fail; it will block forever waiting for resources to become available.

Kernel Memory Caches

The kernel memory allocator divides the memory it manages into a set of caches. All allocations are supplied from these caches, which are represented by the kmem_cache_t data structure. Each cache has a fixed buffer size, which represents the maximum allocation size satisfied by that cache. Each cache has a string name indicating the type of data it manages.

Some kernel memory caches are special purpose and are initialized to allocate only a particular kind of data structure. An example of this is the “thread_cache,” which allocates only structures of type kthread_t. Memory from these caches is allocated to clients by the kmem_cache_alloc() function and freed by the kmem_cache_free() function.


Note –

kmem_cache_alloc() and kmem_cache_free() are not public DDI interfaces. Do NOT write code that relies on them, because they are subject to change or removal in future releases of Solaris.


Caches whose name begins with “kmem_alloc_” implement the kernel's general memory allocation scheme. These caches provide memory to clients of kmem_alloc(9F) and kmem_zalloc(9F). Each of these caches satisfies requests whose size is between the buffer size of that cache and the buffer size of the next smallest cache. For example, the kernel has kmem_alloc_8 and kmem_alloc_16 caches. In this case, the kmem_alloc_16 cache handles all client requests for 9-16 bytes of memory. Remember that the size of each buffer in the kmem_alloc_16 cache is 16 bytes, regardless of the size of the client request. In a 14 byte request, two bytes of the resulting buffer are unused, since the request is satisfied from the kmem_alloc_16 cache.

The last set of caches are those used internally by the kernel memory allocator for its own bookkeeping. These include those caches whose names start with “kmem_magazine_” or “kmem_va_”, the kmem_slab_cache, the kmem_bufctl_cache and others.

Kernel Memory Caches

This section explains how to find and examine kernel memory caches. You can learn about the various kmem caches on the system by issuing the ::kmastat command.

> ::kmastat
cache                        buf    buf    buf    memory     alloc alloc
name                        size in use  total    in use   succeed  fail
------------------------- ------ ------ ------ --------- --------- -----
kmem_magazine_1                8     24   1020      8192        24     0
kmem_magazine_3               16    141    510      8192       141     0
kmem_magazine_7               32     96    255      8192        96     0
...
kmem_alloc_8                   8   3614   3751     90112   9834113     0
kmem_alloc_16                 16   2781   3072     98304   8278603     0
kmem_alloc_24                 24    517    612     24576    680537     0
kmem_alloc_32                 32    398    510     24576    903214     0
kmem_alloc_40                 40    482    584     32768    672089     0
...
thread_cache                 368    107    126     49152    669881     0
lwp_cache                    576    107    117     73728       182     0
turnstile_cache               36    149    292     16384    670506     0
cred_cache                    96      6     73      8192   2677787     0
...

If you run ::kmastat you get a feel for what a “normal” system looks like. This will help you to spot excessively large caches on systems that are leaking memory. The results of ::kmastat will vary depending on the system you are running on, how many processes are running, and so forth.

Another way to list the various kmem caches is with the ::kmem_cache command:

> ::kmem_cache
ADDR     NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL
70036028 kmem_magazine_1           0020 0e0000        8     1020
700362a8 kmem_magazine_3           0020 0e0000       16      510
70036528 kmem_magazine_7           0020 0e0000       32      255
...
70039428 kmem_alloc_8              020f 000000        8     3751
700396a8 kmem_alloc_16             020f 000000       16     3072
70039928 kmem_alloc_24             020f 000000       24      612
70039ba8 kmem_alloc_32             020f 000000       32      510
7003a028 kmem_alloc_40             020f 000000       40      584
...

This command is useful because it maps cache names to addresses, and provides the debugging flags for each cache in the FLAG column. It is important to understand that the allocator's selection of debugging features is derived on a per-cache basis from this set of flags. These are set in conjunction with the global kmem_flags variable at cache creation time. Setting kmem_flags while the system is running has no effect on the debugging behavior, except for subsequently created caches (which is rare after boot-up).

Next, walk the list of kmem caches directly using MDB's kmem_cache walker:

> ::walk kmem_cache
70036028
700362a8
70036528
700367a8
...

This produces a list of pointers that correspond to each kmem cache in the kernel. To find out about a specific cache, apply the kmem_cache macro:

> 0x70039928$<kmem_cache
0x70039928:     lock
0x70039928:     owner/waiters
                0
0x70039930:     flags           freelist        offset
                20f             707c86a0        24
0x7003993c:     global_alloc    global_free     alloc_fail
                523             0               0
0x70039948:     hash_shift      hash_mask       hash_table
                5               1ff             70444858
0x70039954:     nullslab
0x70039954:     cache           base            next
                70039928        0               702d5de0
0x70039960:     prev            head            tail
                707c86a0        0               0
0x7003996c:     refcnt          chunks
                -1              0
0x70039974:     constructor     destructor      reclaim
                0               0               0
0x70039980:     private         arena           cflags
                0               104444f8        0
0x70039994:     bufsize         align           chunksize
                24              8               40
0x700399a0:     slabsize        color           maxcolor
                8192            24              32
0x700399ac:     slab_create     slab_destroy    buftotal
                3               0               612
0x700399b8:     bufmax          rescale         lookup_depth
                612             1               0
0x700399c4:     kstat           next            prev
                702c8608        70039ba8        700396a8
0x700399d0:     name    kmem_alloc_24
0x700399f0:     bufctl_cache    magazine_cache  magazine_size
                70037ba8        700367a8        15
...

Important fields for debugging include 'bufsize', 'flags' and 'name'. The name of the kmem_cache (in this case “kmem_alloc_24”) indicates its purpose in the system. Bufsize indicates the size of each buffer in this cache; in this case, the cache is used for allocations of size 24 and smaller. 'flags' indicates what debugging features are turned on for this cache. You can find the debugging flags listed in <sys/kmem_impl.h>. In this case 'flags' is 0x20f, which is KMF_AUDIT | KMF_DEADBEEF | KMF_REDZONE | KMF_CONTENTS | KMF_HASH. This document explains each of the debugging features in subsequent sections.

When you are interested in looking at buffers in a particular cache, you can walk the allocated and freed buffers in that cache directly:

> 0x70039928::walk kmem
704ba010
702ba008
704ba038
702ba030
...

> 0x70039928::walk freemem
70a9ae50
70a9ae28
704bb730
704bb2f8
...

MDB provides a shortcut to supplying the cache address to the kmem walker: a specific walker is provided for each kmem cache, and its name is the same as the name of the cache. For example:

> ::walk kmem_alloc_24
704ba010
702ba008
704ba038
702ba030
...

> ::walk thread_cache
70b38080
70aac060
705c4020
70aac1e0
...

Now you know how to iterate over the kernel memory allocator's internal data structures and examine the most important members of the kmem_cache data structure.

Detecting Memory Corruption

One of the primary debugging facilities of the allocator is that it includes algorithms to recognize data corruption quickly. When corruption is detected, the allocator immediately panics the system. This section describes how the allocator recognizes data corruption. You must understand this to be able to debug these problems.

Memory abuse typically falls into one of the following categories:

Keep these problems in mind as you read the next three sections. They will help you to understand the allocator's design, and enable you to diagnose problems more efficiently.

Freed Buffer Checking: 0xdeadbeef

When the KMF_DEADBEEF (0x2) bit is set in the flags field of a kmem_cache, the allocator tries to make memory corruption easy to detect by writing a special pattern into all freed buffers. This pattern is 0xdeadbeef. Since a typical region of memory contains both allocated and freed memory, sections of each kind of block will be interspersed. The following example is from the kmem_alloc_24 cache:

0x70a9add8:     deadbeef        deadbeef
0x70a9ade0:     deadbeef        deadbeef
0x70a9ade8:     deadbeef        deadbeef
0x70a9adf0:     feedface        feedface
0x70a9adf8:     70ae3260        8440c68e
0x70a9ae00:     5               4ef83
0x70a9ae08:     0               0
0x70a9ae10:     1               bbddcafe
0x70a9ae18:     feedface        139d
0x70a9ae20:     70ae3200        d1befaed
0x70a9ae28:     deadbeef        deadbeef
0x70a9ae30:     deadbeef        deadbeef
0x70a9ae38:     deadbeef        deadbeef
0x70a9ae40:     feedface        feedface
0x70a9ae48:     70ae31a0        8440c54e

The buffers at 0x70a9add8 and 0x70a9ae28 are filled with 0xdeadbeefdeadbeef, which shows that these buffers are free. The buffer redzones are filled with 0xfeedfacefeedface, which indicates they are untouched (no buffer overrun has occurred). See the following section for an explanation of redzones. At 0x70a9ae00 an allocated buffer is located between the two free buffers.

Redzone: 0xfeedface

Note the pattern 0xfeedface in the buffer shown in the previous section. This pattern is known as the redzone indicator. This pattern enables the allocator (and a programmer debugging a problem) to determine whether the boundaries of a buffer have been violated. Following the redzone is some additional information. The content of that data depends on other factors (see Memory Allocation Logging). The redzone and its suffix are collectively called the buftag region. Figure 9–1 summarizes this information.

Figure 9–1 The Redzone

Graphic described by context.

The buftag is appended to each buffer in a cache when any of the KMF_AUDIT, KMF_DEADBEEF, or KMF_REDZONE flags is set in that buffer's cache. The content of the buftag depends on whether KMF_AUDIT is set.

Decomposing the memory region presented above into distinct buffers is now simple:

0x70a9add8:     deadbeef        deadbeef  \
0x70a9ade0:     deadbeef        deadbeef   +- User Data (free)
0x70a9ade8:     deadbeef        deadbeef  /
0x70a9adf0:     feedface        feedface  -- REDZONE
0x70a9adf8:     70ae3260        8440c68e  -- Debugging Data

0x70a9ae00:     5               4ef83     \
0x70a9ae08:     0               0          +- User Data (allocated)
0x70a9ae10:     1               bbddcafe  /
0x70a9ae18:     feedface        139d    -- REDZONE
0x70a9ae20:     70ae3200        d1befaed  -- Debugging Data

0x70a9ae28:     deadbeef        deadbeef  \
0x70a9ae30:     deadbeef        deadbeef   +- User Data (free)
0x70a9ae38:     deadbeef        deadbeef  /
0x70a9ae40:     feedface        feedface  -- REDZONE
0x70a9ae48:     70ae31a0        8440c54e  -- Debugging Data

The buffers at 0x70a9add8 and 0x70a9ae28 are filled with 0xdeadbeefdeadbeef, which shows that these buffers are free. The buffer redzones are filled with 0xfeedfacefeedface, which indicates they are untouched (no buffer overrun has occurred).

0xbaddcafe

Buffer is allocated but uninitialized (see Uninitialized Data: 0xbaddcafe).

0xdeadbeef

Buffer is free.

0xfeedface

Buffer limits were respected (no overflow).

In the allocated buffer beginning at 0x70a9ae00, the situation is different. Recall from Allocator Basics that there are two allocation types:

  1. The client requested memory using kmem_cache_alloc(9F), in which case the size of the requested buffer is equal to the bufsize of the cache.

  2. The client requested memory using kmem_alloc(9F), in which case the size of the requested buffer is less than or equal to the bufsize of the cache. For example, a request for 20 bytes will be fulfilled from the kmem_alloc_24 cache. The allocator enforces the buffer boundary by placing a marker, the redzone byte, immediately following the client data:

    0x70a9ae00:     5               4ef83     \
    0x70a9ae08:     0               0          +- User Data (allocated)
    0x70a9ae10:     1               bbddcafe  /
    0x70a9ae18:     feedface        139d    -- REDZONE
    0x70a9ae20:     70ae3200        d1befaed  -- Debugging Data

The 0xfeedface value at 0x70a9ae18 is followed by a 32-bit word containing what seems to be a random value. This number is actually an encoded representation of the size of the buffer. To decode this number and find the size of the allocated buffer, use the formula:

size = redzone_value / 251

So, in this example,

size = 0x139d / 251 = 20 bytes.

This indicates that the buffer requested was of size 20 bytes. The allocator performs this decoding operation and finds that the redzone byte should be at offset 20. The redzone byte is the hex pattern 0xbb, which is present at 0x729084e4 (0x729084d0 + 0t20) as expected.

Figure 9–2 Sample kmem_alloc(9F) Buffer

This graphic depicts a sample kmem_alloc buffer. The
redzone byte, uninitialized data, and debugging data are marked.

Figure 9–3 shows the general form of this memory layout.

Figure 9–3 Redzone Byte

This graphic shows the redzone byte being written after
the end of the user data region. The redzone byte is determined by decoding
the index.

If the allocation size is the same as the bufsize of the cache, the redzone byte overwrites the first byte of the redzone itself, as shown in Figure 9–4.

Figure 9–4 Redzone Byte at the Beginning of the Redzone

Graphic described by context.

This overwriting results in the first 32-bit word of the redzone being 0xbbedface, or 0xfeedfabb depending on the endianness of the hardware on which the system is running.


Note –

Why is the allocation size encoded this way? To encode the size, the allocator uses the formula (251 * size + 1). When the size decode occurs, the integer division discards the remainder of '+1'. However, the addition of 1 is valuable because the allocator can check whether the size is valid by testing whether (size % 251 == 1). In this way, the allocator defends against corruption of the redzone byte index.


Uninitialized Data: 0xbaddcafe

You might be wondering what the suspicious 0xbbddcafe at address 0x729084d4 was before the redzone byte got placed over the first byte in the word. It was 0xbaddcafe. When the KMF_DEADBEEF flag is set in the cache, allocated but uninitialized memory is filled with the 0xbaddcafe pattern. When the allocator performs an allocation, it loops across the words of the buffer and verifies that each word contains 0xdeadbeef, then fills that word with 0xbaddcafe.

A system can panic with a message such as:

panic[cpu1]/thread=e1979420: BAD TRAP: type=e (Page Fault)
rp=ef641e88 addr=baddcafe occurred in module "unix" due to an
illegal access to a user address

In this case, the address that caused the fault was 0xbaddcafe: the panicking thread has accessed some data that was never initialized.

Associating Panic Messages With Failures

The kernel memory allocator emits panic messages corresponding to the failure modes described earlier. For example, a system can panic with a message such as:

kernel memory allocator: buffer modified after being freed
modification occurred at offset 0x30

The allocator was able to detect this case because it tried to validate that the buffer in question was filled with 0xdeadbeef. At offset 0x30, this condition was not met. Since this condition indicates memory corruption, the allocator panicked the system.

Another example failure message is:

kernel memory allocator: redzone violation: write past end of buffer

The allocator was able to detect this case because it tried to validate that the redzone byte (0xbb) was in the location it determined from the redzone size encoding. It failed to find the signature byte in the correct location. Since this indicates memory corruption, the allocator panicked the system. Other allocator panic messages are discussed later.

Memory Allocation Logging

This section explains the logging features of the kernel memory allocator and how you can employ them to debug system crashes.

Buftag Data Integrity

As explained earlier, the second half of each buftag contains extra information about the corresponding buffer. Some of this data is debugging information, and some is data private to the allocator. While this auxiliary data can take several different forms, it is collectively known as “Buffer Control” or bufctl data.

However, the allocator needs to know whether a buffer's bufctl pointer is valid, since this pointer might also have been corrupted by malfunctioning code. The allocator confirms the integrity of its auxiliary pointer by storing the pointer and an encoded version of that pointer, and then cross-checking the two versions.

As shown in Figure 9–5, these pointers are the bcp (buffer control pointer) and bxstat (buffer control XOR status). The allocator arranges bcp and bxstat so that the expression bcp XOR bxstat equals a well-known value.

Figure 9–5 Extra Debugging Data in the Buftag

Graphic is described by context.

In the event that one or both of these pointers becomes corrupted, the allocator can easily detect such corruption and panic the system. When a buffer is allocated, bcp XOR bxstat = 0xa110c8ed (“allocated”). When a buffer is free, bcp XOR bxstat = 0xf4eef4ee (“freefree”).


Note –

You might find it helpful to re-examine the example provided in Freed Buffer Checking: 0xdeadbeef, in order to confirm that the buftag pointers shown there are consistent.


In the event that the allocator finds a corrupt buftag, it panics the system and produces a message similar to the following:

kernel memory allocator: boundary tag corrupted
    bcp ^ bxstat = 0xffeef4ee, should be f4eef4ee

Remember, if bcp is corrupt, it is still possible to retrieve its value by taking the value of bxstat XOR 0xf4eef4ee or bxstat XOR 0xa110c8ed, depending on whether the buffer is allocated or free.

The bufctl Pointer

The buffer control (bufctl) pointer contained in the buftag region can have different meanings, depending on the cache's kmem_flags. The behavior toggled by the KMF_AUDIT flag is of particular interest: when the KMF_AUDIT flag is not set, the kernel memory allocator allocates a kmem_bufctl_t structure for each buffer. This structure contains some minimal accounting information about each buffer. When the KMF_AUDIT flag is set, the allocator instead allocates a kmem_bufctl_audit_t, an extended version of the kmem_bufctl_t.

This section presumes the KMF_AUDIT flag is set. For caches that do not have this bit set, the amount of available debugging information is reduced.

The kmem_bufctl_audit_t (bufctl_audit for short) contains additional information about the last transaction that occurred on this buffer. The following example shows how to apply the bufctl_audit macro to examine an audit record. The buffer shown is the example buffer used in Detecting Memory Corruption:

> 0x70a9ae00,5/KKn
0x70a9ae00:     5               4ef83
                0               0
                1               bbddcafe
                feedface        139d
                70ae3200        d1befaed

Using the techniques presented above, it is easy to see that 0x70ae3200 points to the bufctl_audit record: it is the first pointer following the redzone. To examine the bufctl_audit record it points to, apply the bufctl_audit macro:

> 0x70ae3200$<bufctl_audit
0x70ae3200:     next            addr            slab
                70378000        70a9ae00        707c86a0
0x70ae320c:     cache           timestamp       thread
                70039928        e1bd0e26afe     70aac4e0
0x70ae321c:     lastlog         contents        stackdepth
                7011c7c0        7018a0b0        4
0x70ae3228:
                kmem_zalloc+0x30
                pid_assign+8
                getproc+0x68
                cfork+0x60

The 'addr' field is the address of the buffer corresponding to this bufctl_audit record. This is the original address: 0x70a9ae00. The 'cache' field points at the kmem_cache that allocated this buffer. You can use the ::kmem_cache dcmd to examine it as follows:

> 0x70039928::kmem_cache
ADDR     NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL
70039928 kmem_alloc_24             020f 000000       24      612

The 'timestamp' field represents the time this transaction occurred. This time is expressed in the same manner as gethrtime(3C).

'thread' is a pointer to the thread that performed the last transaction on this buffer. The 'lastlog' and 'contents' pointers point to locations in the allocator's transaction logs. These logs are discussed in detail in Allocator Logging Facility.

Typically, the most useful piece of information provided by bufctl_audit is the stack trace recorded at the point at which the transaction took place. In this case, the transaction was an allocation called as part of executing fork(2).

Advanced Memory Analysis

This section describes facilities for performing advanced memory analysis, including locating memory leaks and sources of data corruption.

Finding Memory Leaks

The ::findleaks dcmd provides powerful and efficient detection of memory leaks in kernel crash dumps where the full set of kmem debug features has been enabled. The first execution of ::findleaks processes the dump for memory leaks (this can take a few minutes), and then coalesces the leaks by the allocation stack trace. The findleaks report shows a bufctl address and the topmost stack frame for each memory leak that was identified:

> ::findleaks
CACHE     LEAKED   BUFCTL CALLER
70039ba8       1 703746c0 pm_autoconfig+0x708
70039ba8       1 703748a0 pm_autoconfig+0x708
7003a028       1 70d3b1a0 sigaddq+0x108
7003c7a8       1 70515200 pm_ioctl+0x187c
------------------------------------------------------
   Total       4 buffers, 376 bytes

Using the bufctl pointers, you can obtain the complete stack backtrace of the allocation by applying the bufctl_audit macro:

> 70d3b1a0$<bufctl_audit
0x70d3b1a0:     next            addr            slab
                70a049c0        70d03b28        70bb7480
0x70d3b1ac:     cache           timestamp       thread
                7003a028        13f7cf63b3      70b38380
0x70d3b1bc:     lastlog         contents        stackdepth
                700d6e60        0               5
0x70d3b1c8:
                kmem_alloc+0x30
                sigaddq+0x108
                sigsendproc+0x210
                sigqkill+0x90
                kill+0x28

The programmer can usually use the bufctl_audit information and the allocation stack trace to quickly track down the code path that leaks the given buffer.

Finding References to Data

When trying to diagnose a memory corruption problem, you should know what other kernel entities hold a copy of a particular pointer. This is important because it can reveal which thread accessed a data structure after it was freed. It can also make it easier to understand what kernel entities are sharing knowledge of a particular (valid) data item. The ::whatis and ::kgrep dcmds can be used to answer these questions. You can apply ::whatis to a value of interest:

> 0x705d8640::whatis
705d8640 is 705d8640+0, allocated from streams_mblk

In this case, 0x705d8640 is revealed to be a pointer to a STREAMS mblk structure. To see the entire allocation tree, use ::whatis -a instead:

> 0x705d8640::whatis -a
705d8640 is 705d8640+0, allocated from streams_mblk
705d8640 is 705d8000+640, allocated from kmem_va_8192
705d8640 is 705d8000+640 from kmem_default vmem arena
705d8640 is 705d2000+2640 from kmem_va vmem arena
705d8640 is 705d2000+2640 from heap vmem arena

This reveals that the allocation also appears in the kmem_va_8192 cache. The kmem_va_8192 cache is a kmem cache that is fronting the kmem_va vmem arena. It also shows the full stack of vmem allocations.

The complete list of kmem caches and vmem arenas is displayed by the ::kmastat dcmd. You can use ::kgrep to locate other kernel addresses that contain a pointer to this mblk. This illustrates the hierarchical nature of memory allocations in the system; in general, you can determine the type of object referred to by the given address from the name of the most specific kmem cache.

> 0x705d8640::kgrep
400a3720
70580d24
7069d7f0
706a37ec
706add34

and investigate them by applying ::whatis again:

> 400a3720::whatis
400a3720 is in thread 7095b240's stack

> 706add34::whatis
706add34 is 706add20+14, allocated from streams_dblk_120

Here one pointer is located on the stack of a known kernel thread, and another is the mblk pointer inside of the corresponding STREAMS dblk structure.

Finding Corrupt Buffers With ::kmem_verify

MDB's ::kmem_verify dcmd implements most of the same checks that the kmem allocator does at runtime. ::kmem_verify can be invoked in order to scan every kmem cache with appropriate kmem_flags, or to examine a particular cache.

Here is an example of using ::kmem_verify to isolate a problem:

> ::kmem_verify
Cache Name                      Addr     Cache Integrity
kmem_alloc_8                    70039428 clean
kmem_alloc_16                   700396a8 clean
kmem_alloc_24                   70039928 1 corrupt buffer
kmem_alloc_32                   70039ba8 clean
kmem_alloc_40                   7003a028 clean
kmem_alloc_48                   7003a2a8 clean
...

It is easy to see here that the kmem_alloc_24 cache contains what ::kmem_verify believes to be a problem. With an explicit cache argument, the ::kmem_verify dcmd provides more detailed information about the problem:

> 70039928::kmem_verify
Summary for cache 'kmem_alloc_24'
  buffer 702babc0 (free) seems corrupted, at 702babc0

The next step is to examine the buffer which ::kmem_verify believes to be corrupt:

> 0x702babc0,5/KKn
0x702babc0:     0               deadbeef
                deadbeef        deadbeef
                deadbeef        deadbeef
                feedface        feedface
                703785a0        84d9714e

The reason that ::kmem_verify flagged this buffer is now clear: The first word in the buffer (at 0x702babc0) should probably be filled with the 0xdeadbeef pattern, not with a 0. At this point, examining the bufctl_audit for this buffer might yield clues about what code recently wrote to the buffer, indicating where and when it was freed.

Another useful technique in this situation is to use ::kgrep to search the address space for references to address 0x702babc0, in order to discover what threads or data structures are still holding references to this freed data.

Allocator Logging Facility

When KMF_AUDIT is set for a cache, the kernel memory allocator maintains a log that records the recent history of its activity. This transaction log records bufctl_audit records. If the KMF_AUDIT and the KMF_CONTENTS flags are both set, the allocator generates a contents log that records portions of the actual contents of allocated and freed buffers. The structure and use of the contents log is outside the scope of this document. The transaction log is discussed in this section.

MDB provides several facilities for displaying the transaction log. The simplest is ::walk kmem_log, which prints out the transaction in the log as a series of bufctl_audit_t pointers:

> ::walk kmem_log
70128340
701282e0
70128280
70128220
701281c0
...
> 70128340$<bufctl_audit
0x70128340:     next            addr            slab
                70ac1d40        70bc4ea8        70bb7c00
0x7012834c:     cache           timestamp       thread
                70039428        e1bd7abe721     70aacde0
0x7012835c:     lastlog         contents        stackdepth
                701282e0        7018f340        4
0x70128368:
                kmem_cache_free+0x24
                nfs3_sync+0x3c
                vfs_sync+0x84
                syssync+4

A more elegant way to view the entire transaction log is by using the ::kmem_log command:

> ::kmem_log
CPU ADDR     BUFADDR         TIMESTAMP THREAD
  0 70128340 70bc4ea8      e1bd7abe721 70aacde0
  0 701282e0 70bc4ea8      e1bd7aa86fa 70aacde0
  0 70128280 70bc4ea8      e1bd7aa27dd 70aacde0
  0 70128220 70bc4ea8      e1bd7a98a6e 70aacde0
  0 701281c0 70d03738      e1bd7a8e3e0 70aacde0
  ...
  0 70127140 70cf78a0      e1bd78035ad 70aacde0
  0 701270e0 709cf6c0      e1bd6d2573a 40033e60
  0 70127080 70cedf20      e1bd6d1e984 40033e60
  0 70127020 70b09578      e1bd5fc1791 40033e60
  0 70126fc0 70cf78a0      e1bd5fb6b5a 40033e60
  0 70126f60 705ed388      e1bd5fb080d 40033e60
  0 70126f00 705ed388      e1bd551ff73 70aacde0
  ...

The output of ::kmem_log is sorted in descending order by timestamp. The ADDR column is the bufctl_audit structure corresponding to that transaction; BUFADDR points to the actual buffer.

These figures represent transactions on buffers (both allocations and frees). When a particular buffer is corrupted, it can be helpful to locate that buffer in the transaction log, then determine in which other transactions the transacting thread was involved. This can help to assemble a picture of the sequence of events that occurred prior to and after the allocation (or free) of a buffer.

You can employ the ::bufctl command to filter the output of walking the transaction log. The ::bufctl -a command filters the buffers in the transaction log by buffer address. This example filters on buffer 0x70b09578:

> ::walk kmem_log | ::bufctl -a 0x70b09578
ADDR     BUFADDR   TIMESTAMP   THREAD   CALLER
70127020 70b09578  e1bd5fc1791 40033e60 biodone+0x108
70126e40 70b09578  e1bd55062da 70aacde0 pageio_setup+0x268
70126de0 70b09578  e1bd52b2317 40033e60 biodone+0x108
70126c00 70b09578  e1bd497ee8e 70aacde0 pageio_setup+0x268
70120480 70b09578  e1bd21c5e2a 70aacde0 elfexec+0x9f0
70120060 70b09578  e1bd20f5ab5 70aacde0 getelfhead+0x100
7011ef20 70b09578  e1bd1e9a1dd 70aacde0 ufs_getpage_miss+0x354
7011d720 70b09578  e1bd1170dc4 70aacde0 pageio_setup+0x268
70117d80 70b09578  e1bcff6ff27 70bc2480 elfexec+0x9f0
70117960 70b09578  e1bcfea4a9f 70bc2480 getelfhead+0x100
...

This example illustrates that a particular buffer can be used in numerous transactions.


Note –

Remember that the kmem transaction log is an incomplete record of the transactions made by the kernel memory allocator. Older entries in the log are evicted as needed in order to keep the size of the log constant.


The ::allocdby and ::freedby dcmds provide a convenient way to summarize transactions associated with a particular thread. Here is an example of listing the recent allocations performed by thread 0x70aacde0:

> 0x70aacde0::allocdby
BUFCTL      TIMESTAMP CALLER
70d4d8c0  e1edb14511a allocb+0x88
70d4e8a0  e1edb142472 dblk_constructor+0xc
70d4a240  e1edb13dd4f allocb+0x88
70d4e840  e1edb13aeec dblk_constructor+0xc
70d4d860  e1ed8344071 allocb+0x88
70d4e7e0  e1ed8342536 dblk_constructor+0xc
70d4a1e0  e1ed82b3a3c allocb+0x88
70a53f80  e1ed82b0b91 dblk_constructor+0xc
70d4d800  e1e9b663b92 allocb+0x88

By examining bufctl_audit records, you can understand the recent activities of a particular thread.