NAME | INTRODUCTION | GETTING STARTED | CONCEPTS | RESOURCE CONTROL | REPORTING TOOLS | SUB-ADMINISTRATORS AND GROUP ADMINISTRATORS | LNODE MANAGEMENT | ATTRIBUTES | SEE ALSO | WARNINGS | NOTES
Solaris Resource Manager is an optional enhancement of the Solaris operating environment to provide:
Explicit allocation of system resources (CPU, virtual memory, terminal connect-time and number of connections, and process count) to users
Accumulated accounting information on the resource usage of each user
Fully hierarchical grouping of users, allowing the same degree of control and accounting at higher organizational levels
Facilities for decentralized administration of users by sub-administrators
Type the command:
liminfo
If Solaris Resource Manager is operating, then the output is a display of your assorted resource and accounting information, which is an overview of most of the functionality of Solaris Resource Manager.
The default directory for Solaris Resource Manager man pages is /usr/srm/man. The default directory for SunOS man pages is /usr/man.
Solaris Resource Manager maintains additional non-volatile information for each user. The kernel is modified to represent each user internally with a new structure called a limit node (lnode). Each user's lnode is indexed by the user's UID number. The lnode contains all extra per-user information required by Solaris Resource Manager. Most of the information displayed by liminfo is taken directly from the lnode.
There are some special lnodes:
Always exists; it is the root of the scheduling tree and is not subject to limits.
See srmadm(1MSRM).
See srmadm(1MSRM).
See limadm(1MSRM).
The data fields in an lnode are called attributes, which are referenced by name. Each attribute has one of the types:
integer; a time interval in seconds
integer; a date/time in seconds relative to system epoch, 1-Jan-1970
integer; a UID
enum { inherit, set, clear, group }
All attributes with integer or long types are currently treated as unsigned.
A flag is similar to a boolean:
it evaluates to either set or clear.
A third value, group, is used exclusively by device
flags such as terminal.flag.devicename.
The special value inherit is explained below in
the section on Hierarchical Control.
Lnodes are arranged in a strict, system-wide hierarchy called the scheduling tree, with the root lnode at its root. The first layer of lnodes below root refer to root as their parent, and are described as children of root. This relationship repeats with each level of the hierarchy. Each subtree of the scheduling tree is called a scheduling group. The lnode at the root of each subtree is called the group's header. All lnodes in a scheduling group except the header are known as members of the header's scheduling group.
The administrators of a system may use scheduling groups to represent the organizations, departments, and projects that use the system. Note that scheduling groups have nothing to do with the file groups defined in the group database.
All internal as well as leaf lnodes represent users, so all group headers require unique UIDs. Group header users can optionally be granted limited administrative power over the members of their scheduling group. There is no special property that distinguishes a user from a group header; a group header is simply a user who is the scheduling tree parent of one or more other users.
The structure of the scheduling tree is defined by the following attribute:
Uid; assigned by the administrator. The UID of the lnode's parent in the scheduling tree. The value of this attribute is meaningless for the root lnode but evaluates to zero.
Solaris Resource Manager can be customized by the system administrator to control all kinds of resources. Control for the following system resources is built into Solaris Resource Manager:
CPU (rate of process execution)
Virtual memory size
Process count
Terminal connect-time/number of connections
Every user's resource allocations are controlled by usage and limit attributes in the lnode. A usage has a value that is increased as a resource is consumed, and decreased as the resource is released. A limit has a value which the usage is not permitted to exceed. A limit of zero is commonly used to represent no limit. Accounting information is kept in accrue attributes, which have non-decreasing values that measure the consumption of a resource over time. A variety of privileges are controlled by flag attributes.
Resources are hierarchically controlled, that is, an entire group can be allocated resources as though it were a single user. This is achieved by making an lnode's resource limit apply to the total usage of the scheduling group of which the lnode is header.
The usage attribute of a hierarchically controlled resource is the sum of the user's own resource usage plus the usage attributes of the child lnodes. There is often an accompanying myusage attribute, which is equal to the user's own usage. The limit attribute applies to usage, not to myusage.
Privileges are hierarchically controlled using the special flag value inherit. Whenever a flag's value is tested, if it is found to be inherit, then the value is taken from the lnode's scheduling tree parent. If that is also inherit, then the search continues up through the scheduling tree until a real value is found, or the root lnode is reached. If the root lnode's flag is inherit, then a configurable system-wide default value is used.
Every process in the system is attached to an lnode and is subject to the kernel limits and privileges of that lnode. When a process forks, the child is attached to the same lnode as its parent. The init process and all system processes are usually left attached to the root lnode. A process gets attached to a different lnode only in certain cases when it calls the setuid(2) system call, for which it must have superuser permission.
An active lnode is one that has one or more processes attached to it, or has one or more active member lnodes. That is, when a process attaches to an lnode, that lnode and all of its parents in the scheduling tree become active.
If Solaris Resource Manager is installed in the kernel, but no lnode database exists, all processes are attached to a surrogate root lnode. This is replaced by the real root lnode when the lnode file is opened.
The Solaris Resource Manager CPU scheduler, SHR, differs from the Solaris time-sharing scheduler (TS) in that it schedules users against each other, rather than LWPs, making it impossible for any user to acquire more CPU service just by running more processes concurrently.
When Solaris Resource Manager is enabled, a scheduling class module, SHR which is a functional replacement of the TS class, is loaded. The init process is then usually started in the SHR class, hence all LWPs of processes started by init begin in the SHR class. LWPs can still be moved into the RT class for real-time scheduling; system kernel processes remain in the SYS class. Only LWPs in the SHR class are subjected to Solaris Resource Manager scheduling.
Regardless of their scheduling class, all LWPs of a process are always attached to the same lnode. Like ownership of an address space, or credentials, lnode attachment is a process property that affects all the LWPs of the process.
The relevant attributes are:
integer; assigned by the administrator. The number of shares given
to the whole group. This defines what fraction of the parent group's entitlement
is allocated to this group, as a ratio with the summed cpu.shares of all active peer lnodes and cpu.myshares of
the parent lnode.
integer; assigned by the administrator. The number of shares given
to the group header user. This defines what fraction of the group's entitlement
is allocated to the group header user, as a ratio with the summed cpu.shares of all active children lnodes. It is meaningless for
leaf lnodes.
double; accumulated and decayed by the kernel. The weighted sum of
charges for recent CPU service.
double; accumulated by the kernel. The weighted sum of charges for
CPU service.
date; set by the kernel. The most recent time at which the cpu.usage attribute was updated.
CPU entitlements are chosen by the administrators, who assign each lnode a number of shares and myshares. These are analogous to shares in a company: the absolute quantity is not important; they are meaningful only in comparison with other lnode's shares. Administrators are free to choose whatever numbers they want for these two attributes, as long as they are in proportion to the desired CPU entitlements. Furthermore, the choice of numbers at any level of the scheduling tree is completely independent of the choice of numbers at any other level or branch.
For example, consider a single level of one branch of a scheduling tree, with user A1 as the header, and users B1, B2, and B3 as children:
|
A1 |
cpu.shares=250, |
cpu.myshares=15 |
|
B1 |
cpu.shares=10, |
cpu.myshares=100 |
|
B2 |
cpu.shares=20, |
cpu.myshares=3000 |
|
B3 |
cpu.shares=5, |
cpu.myshares=2 |
The total number of shares at this level = header's myshares + children's shares = 15 + (10 + 20 + 5) = 50. Suppose that the group A1 has a CPU entitlement of 60 percent. The user A1 therefore has a CPU entitlement of (15 / 50) x 60% = 18%, the group B1 has a CPU entitlement of (10 / 50) x 60% = 12%, the group B2 (20 / 50) x 60% = 24%, and the group B3 (5 / 50) x 60% = 6%. These entitlements might be further subdivided below B1, B2, and B3, in a similar fashion.
As shown in the example, the shares attributes of an lnode, in relation to other lnodes, define the CPU entitlement of the lnode. Over the longer term, provided they use it, Solaris Resource Manager will ensure that lnodes receive CPU in proportion to their entitlements.
By ignoring inactive lnodes, a related value is directly calculated by the scheduler as a fraction between 0 and 1, and formally called the allocated share. All allocated shares are recomputed whenever any lnode becomes active or inactive, or whenever a myshares or shares attribute is changed.
As processes execute, charges are accumulated in the cpu.usage attribute of the lnodes to which they are attached. The kernel periodically decays the CPU usage in every lnode by multiplying it with a decay factor which is less than 1, so that more recent CPU usage has greater weight when taken into account for scheduling. The scheduler continually adjusts the priority of all processes to make each lnode's relative CPU usage converge on its allocated share. This negative feedback mechanism allows direct control of the proportion of CPU rate of service granted to each user and group.
At any instant, it is unlikely that users will be receiving exactly their allocated shares worth of CPU rate, due to uneven demand. The effective share of an lnode is the rate of CPU that it must receive in order to restore the balance of CPU usages with allocated shares. A user's effective share is a rough measure of what fraction of CPU will be given to the user at the time, should it be required.
The nice command has an appropriate effect on processes: a higher nice value means that a process will run more slowly. Under Solaris Resource Manager, however, such a process will accumulate charges at a discounted rate, so the user also benefits by using nice. The maximum nice value is treated specially: the process is a background process and it is scheduled only when there is spare CPU capacity not demanded by non-background processes. The priocntl command can also be used to set or display a process's nice value. See nice(1) and priocntl(1MSRM).
The relevant attributes are:
long; read-only; computed by the kernel. The total virtual memory space
occupied by all processes attached to the lnode, measured in bytes to 1 Kbyte
resolution.
long; read-only; computed by the kernel. The total virtual memory space
occupied by all processes attached to the lnode and all its member lnodes,
measured in bytes to 1 Kbyte resolution. This is computed as memory.myusage of the lnode plus the sum of the memory.usage
attributes of all child lnodes.
long; assigned by the administrator. The maximum allowed value, in
bytes, of the memory.usage attribute. If zero, then there
is no limit, unless limited by inheritance.
long; assigned by the administrator. The maximum virtual memory space,
in bytes, that may be occupied by any individual process attached to the lnode.
If zero, then there is no limit, unless limited by inheritance.
long; accumulated by the kernel. The accurate, continuous sum over
time of the value of the memory.usage attribute, with dimension
byte-seconds.
Memory is allocated by stack page faults, by the mmap(2) system call, and by a few other system calls. If a memory limit is reached, these will fail. A failed stack fault will cause the process to terminate unless the terminating signal is caught on an alternate stack. A failed system call will appear to programs as though there is no more virtual memory (swap space). Some programs will accept this fact and continue normal operation, possibly outputting a warning. Some programs will fail outright, possibly with a helpful diagnostic message. Solaris Resource Manager will write a warning message to the terminals of all affected local users (see limdaemon(1MSRM)).
Memory accounting and limits apply to processes irrespective of their scheduling class.
The relevant attributes are :
long (time interval); increased and decreased by limdaemon(1MSRM). The
number of seconds of connect-time currently charged to the group. This may
not equate to the real-time duration of connections if device costs other
than 1.0 have been configured.
long (time interval); assigned by the administrator. The amount subtracted
from the terminal.usage attribute for every decay point.
time; assigned by the administrator. The time interval
between decay points.
date; assigned by the administrator and updated by limdaemon(1MSRM).
The time at which the most recently performed decay point occurred.
long (time interval); assigned by the administrator. The maximum allowed
value of the terminal.usage attribute. If zero, then there
is no limit, unless limited by inheritance.
long (time interval); increased by limdaemon(1MSRM). The total number
of seconds of connect-time used by the group.
flag; assigned by the administrator.
These flags need not exist for all devices. A user may log in on a given device
only if its corresponding flag exists and evaluates to set,
or if no such flag exists.
Logins are recognized by Solaris Resource Manager through a special PAM module, pam_srm(5SRM).
At login time, the connect-time limits of the user and all scheduling groups to which the user belongs are checked. If the user's terminal usage exceeds its limit, then the user is informed and login is denied. Otherwise, the name and cost of the device are output and the user is allowed to log in. The superuser is exempt from these checks.
While logged in, if any users or scheduling groups come within 5 real minutes of reaching a connect-time limit, a warning message is written to the terminals of all such users or members of the scheduling groups. When the limit is reached, a message is written requesting immediate logout. A short time later (default grace period: 30 seconds), if any of the requested users are still logged in, then their associated processes are sent a SIGTERM signal, and soon after (15 seconds), a SIGKILL signal.
The update of terminal usage and enforcement of terminal limits is performed on a periodic basis by the Solaris Resource Manager daemon process, limdaemon(1MSRM). If the daemon is not running, then usages are not increased and limits are not enforced, except for those limits checked at login time. Connect-time accounting and limits apply to processes irrespective of their scheduling class.
Whenever users reach a limit or approach a connect-time limit, a warning message is written to their terminals by the limdaemon(1MSRM) process. Messages are written directly to a user's terminal by looking up the utmp entries for the device name, and using the write(2) system call. The warnings are in terse, human-readable text, and typically have the form:
User username "resource limit reached" by username.
The first username is the lnode in which the limit was reached. The message is delivered to all users in the scheduling group headed by this lnode because all of these users are affected. The second username is the user whose action caused the limit to be reached.
Privileges are usually controlled using flag attributes. The following attributes are predefined:
flag; assigned by the administrator. If set, the user is a sub-administrator
who can freely add, modify, and remove lnodes, alter any attribute in any
lnode, and attach a process to any lnode. This flag has no effect on the superuser,
who always has these powers. See flag.admin below, and SUB-ADMINISTRATORS AND GROUP ADMINISTRATORS.
flag; assigned by the administrator. If set, the user is a group administrator. This flag grants privileges over users
in the scheduling subtree of which the lnode is group header. This flag has
no effect on the superuser, who always has these powers. See SUB-ADMINISTRATORS AND GROUP ADMINISTRATORS.
flag; assigned by the administrator. If set, the user cannot log in
and is denied connection by remote shell or execution daemons. If both the nologin and onelogin flags are set,
then nologin takes precedence.
flag; assigned by the administrator. If set, the user can have at most
one login connection. Connections by remote shell or execution daemons are
counted as logins for this purpose. If both the nologin
and onelogin flags are set, then nologin takes precedence.
integer; read-only, computed by the Solaris Resource Manager kernel. The current number
of logins. These represent login or remote connections recognized by Solaris Resource Manager.
This attribute does not strictly control a privilege, but is described here
because it is related to the onelogin and asynckill flags.
flag; assigned by the administrator. If set, then all processes attached
to the lnode are killed whenever the value in the logins
attribute drops to zero.
flag; assigned by the administrator. If set, then all processes attached
to the lnode are set to lowest nice when the logins attribute drops to zero.
limreport(1SRM) is a simple but powerful report generator available to all users. It scans sequentially through the password map, selecting users according to a specified selection expression. For each selected user, a report is output. Its format is specified in the style of printf(3C). Expressions for selection and for insertion into the report can refer to any lnode attribute and to password map fields.
liminfo(1SRM) outputs a report on the contents of the invoker's lnode, or gives a sequence of reports on the lnodes of a list of users. There are five report formats available. Three of the reports are designed for easy reading by users, the fourth is designed specifically for use by filters, and the fifth is for debugging.
A user with a set uselimadmin flag (see Privileges) is a sub-administrator. Sub-administrators have the same powers as a central administrator. Group administrators have only a set admin flag
The tools listed in the next section, LNODE MANAGEMENT, are available with full function to the superuser and with restrictions to group administrators.
These commands provide for management of lnodes.
limadm(1MSRM) is the main tool for altering the attributes of an lnode. Alterations are expressed as a list of numeric or symbolic assignments, additions, or subtractions to named attributes. Superusers and uselimadm users can use limadm to alter any writable attribute in any lnode. Sub-administrators are prevented from altering the assigned limits and privileges of lnodes other than those of their own scheduling group members. Any valid assignment to an attribute of a non-existing lnode creates the lnode. This command should be used to create an lnode just after creating the first reference to the corresponding UID in the password map.
limadm(1MSRM) can also be used to delete an inactive leaf lnode. Sub-administrators are prevented from deleting lnodes outside their scheduling group members. This command should be used to delete an lnode just prior to deleting the last reference to the corresponding UID from the password map.
limdaemon(1MSRM) is started automatically in the Solaris Resource Manager startup script at boot time. It decays the usage attributes of terminals in all lnodes, or only in the lnodes of a given list of users. The decay of usage values is regulated using the decay, interval, and lastdecay attributes in each lnode.
srmuser(1SRM) attaches a shell to a named lnode, and optionally executes a given command. This can be useful when performing a costly operation on behalf of another user, because the user rather than the administrator is charged for the CPU and memory used. Note that, unlike su(1M), this does not alter the real and effective UIDs.
See attributes(5) for descriptions of the following attributes:
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
| Architecture | SPARC |
| Availability | SUNWsrmb, SUNWsrmr |
nice(1), su(1M), mmap(2), setuid(2), write(2), printf(3C), liminfo(1SRM), limreport(1SRM), srmstat(1SRM), srmuser(1SRM), dispadmin(1MSRM), limadm(1MSRM), limdaemon(1MSRM), priocntl(1MSRM), srmadm(1MSRM), srmkill(1MSRM), brk(2SRM), nice(2SRM), setuid(2SRM), pam_srm(5SRM)
Solaris Resource Manager 1.3 System Administration Guide
The default state of a newly created lnode has most attributes zeroed; only cpu.shares and cpu.myshares are set to a minimum value, which is 1, while the uselimadm and admin flags are set to clear, and all other flags are set to inherit. Therefore, administrative privilege over other users is denied by default, and it is only by deliberate action that users will be given limits to encounter. Thus, it is the responsibility of sub-administrators to consider fully the implications of their decisions before acting.
The Solaris Resource Manager system is powerful and its effects can be widely felt, so misuse may have large and unpleasant consequences. Users of Solaris Resource Manager must formulate clear and strong policies on system administration.
The current Solaris Resource Manager message notification mechanism only sends the message to local users.
Any locale-specific translation of messages delivered to local users is in accordance with the locale of the limdaemon process, which may differ from that of the users.
This man page is applicable to SunOS 5.6, SunOS 5.7, and SunOS 5.8.
NAME | INTRODUCTION | GETTING STARTED | CONCEPTS | RESOURCE CONTROL | REPORTING TOOLS | SUB-ADMINISTRATORS AND GROUP ADMINISTRATORS | LNODE MANAGEMENT | ATTRIBUTES | SEE ALSO | WARNINGS | NOTES