Monday, April 24, 2017

Resource control - Intro

Resource control historically appeared for limiting the system's resources that processes and their children could consume, but nowadays in Solaris this concept has been elaborated to other collections of processes: tasks, projects and zones as well.

The best practice is to carefully assess (as using extended accounting) the resource consumption of the workloads on the system before applying any fine-grained resource control to prevent over-consumptions. And, of course, above all, the system must meet or exceed the combined resource requirements of all the workloads it's supposed to host.

This topic is vast because there are many resources (resource-controls(5)) ranging from the most "elementary" to the most complex ones, there 3 control levels (basic, privileged and system), there are 4 containment levels (process, task, project and zone), 2 types of actions and flags (local and global) as well more than one available interface managing part (ulimit(1) and getrlimit(2)) or all of this stuff (rctladm(1M), prctl(1), setrctl(2), the projects database and zone configuration).

All the manpages provide extensive information I won't discuss, at least for now. In addition there are some other lenghty references such as the chapter 5 of Resource Management and Oracle® Solaris Zones Developer's Guide which is a kind of revamp of the original chapter 5 of the (partially) archived Solaris Containers: Resource Management and Solaris Zones Developer's Guide.
 
I'll reproduce (with no intent of infringing any Copyrights, please!) an image extracted from the above references that is very helpful illustrating the various containment possibilities and their combinations (but that's not the whole story, just the containments and general relationships):

(Open)Solaris Resource Control containments

For warming up what I'd like to do is to provide or induce some insights from a basic and quick comparison of sample output from ulimit(1) and prctl(1), which, in a sense, respectively corresponds to the legacy (standard) and the enhanced (Solaris) interfaces. The samples concern inspecting the (dynamic) value of the maximum open files per process limit, which is traditionally both inspected as _SC_OPEN_MAX in <unistd.h> for sysconf(3C) API call or as a reference to OPEN_MAX declared in <limits.h>:

$ ulimit -aS |grep open
open files (-n) 1024


$ ulimit -aH |grep open
open files (-n) 65536


The above legacy commands respectively display the soft (current) limit (basic level) and hard limit (privileged level) of the maximum open files per process limit. This will be enforced for all processes "derived" from the shell process from which they were launched, executed or spawned.

Now compare the same query under the more modern Solaris equivalent:

$ prctl -n process.max-file-descriptor $$
process: 2710: -bash
NAME                    PRIVILEGE  VALUE FLAG ACTION RECIPIENT
....max-file-descriptor basic      1.02K    -   deny      2710
                        privileged 65.5K    -   deny         -
                        system     2.15G  max  
deny         -

At first the values 1.02K and 65.5K may be somewhat puzzling for some people unused to big figures. The suffix K means "times 1,000" (x 1,000) and the suffix G means "times 1 billion" (x 1,000,000,000). In additional to these "scalings", the values were also rounded up according to the (varying) "visibly possible" decimal places. In the preceding example (remembering that Gi means binary giga, that is, reasoning in terms of base 2, or in this case 1024^3, then 2Gi = 2,147,483,648) here's how to come to the originally displayed figures:

         1024 / 1,000 =  1.024 ≃ 1.02 x 1,000 = 1.02K
        65535 / 1,000 = 65.535 ≃ 65.5 x 1,000 = 65.5K
2,147,483,648 / 1,000,000,000  ≃ 2.15G

Just for illustration, here's the results for the init system process:

# prctl -n process.max-file-descriptor 1
process: 1: /usr/sbin/init
NAME                    PRIVILEGE  VALUE FLAG ACTION RECIPIENT
....max-file-descriptor basic        256    -   deny         1                         privileged 65.5K    -   deny         -
                        system     2.15G  max   deny         -


From these tiny examples, one can now perhaps start wondering about the power, flexibility and versatility of the much newer and improved Solaris interfaces. Next I play with a few more examples, this time using API calls.

NOTE
By the way the title of the PRIVILEGE column from the prctl(1) output would better be named CONTROL. The VALUE column refers to the current value which may be less than the enforced value due to the associated entity (process, task, project) possesses fewer capabilities than allowable by the enforced value. See rctlblk_set_value(3C).
By exploring with a little program (rc-01) I came to an apparent variation or adaptation of concepts between standards and Solaris as revelead in the second run:
 
If run under a "standard" gnome-terminal(1) the output is:

$ ./rc-01
Process' inherited FD limits:
Current: 1024.
Maximum: 65536.


$ prctl -n process.max-file-descriptor $$
process: 2563: bash
NAME                    PRIVILEGE  VALUE FLAG ACTION RECIPIENT
....max-file-descriptor basic      1.02K    -   deny      2563
                        privileged 65.5K    -   deny         -
                        system     2.15G  max   deny         -


If run under a "standard" TERMINATOR(1) terminal the output is:

$ ./rc-01
Process' inherited FD limits:
Current: 65536.
Maximum: 65536.


$ prctl -n process.max-file-descriptor $$
process: 2545: /usr/bin/bash
NAME                    PRIVILEGE  VALUE FLAG ACTION RECIPIENT
....max-file-descriptor privileged 65.5K    -   deny         -
                        system     2.15G  max   deny         -


The initial conclusion is that the traditional (legacy) standard API (matching ulimit(1)) "never" reaches the system control and furthermore if basic control isn't defined, then the next lowest value is assumed (according to under the newer and expanded Solaris concepts), in this case the same value as privileged control. That's somewhat curious... but reasonable after all. Double-check by reading the last paragraph of setrctl(2).

Now for contrasting with the newer Solaris interface I crafted another program called rc-02. Its typical output one a bare-bones installation may be similar to:

$ ./rc-02

Resource limit: process.max-file-descriptor

    Type: Basic (PID 4538; PPID 2563)
    Global action: deny; Syslog: no
    Lowerable: yes; Infinite: no; 

    Locally maximal: no; Local project: no
    Limit type: count
    Current:  1024
    Enforced: 1024

    Type: Privileged (PID -1)
    Global action: deny; Syslog: no
    Lowerable: yes; Infinite: no; 

    Locally maximal: no; Local project: no
    Limit type: count
    Current:  65536
    Enforced: 65536

    Type: System (PID -1)
    Global action: deny; Syslog: no
    Lowerable: yes; Infinite: no; 

    Locally maximal: yes; Local project: no
    Limit type: count
    Current:  2147483647
    Enforced: 2147483647


The added complexity of the Solaris interface (and the second program) is the price of getting more knowledge and power on the subject. Compare the preceding output with the following (repeated) output from prctl(1) on the same shell:

$ prctl -n process.max-file-descriptor $$
process: 2563: bash
NAME                    PRIVILEGE  VALUE FLAG ACTION RECIPIENT
....max-file-descriptor basic      1.02K    -   deny      2563
                        privileged 65.5K    -   deny         -
                        system     2.15G  max   deny         -


Naturally, by default, the program inherited the limits from the shell. But it could have made some adjustments within the allowed ranges and permissions constrains of the effective user ID which it's running under.

OK, now I'm a little more satisfied :-)