Monday, April 10, 2017

Kernel zones & ZFS ARC

Assuming your system meet sufficient kernel zones support requirements one important tunning is the adjustment of the ZFS ARC maximum bytes (the so known zfs_arc_max in /etc/system). I've done a somewhat similar tunning a couple of years ago as tunning best practice right after installing VirtualBox. For kernel zones it may not be just a case of simple best practice but more likely a be advised or neglect it at your own risk!

By the way, according to more recent Solaris public documentation, the host system sees kernel zones just as another application. The required tuning on the host system should take into account all the kernel zones and processes that are anticipated to run on the system.

In the past, for figuring out the current zfs_arc_max I just relied on the c_max bytes from kstat -n arcstats. But more recently Solaris 11.2 documentation refers to ::memstat from mdb -k. So let's just put them in perspective (remembering that other figures from arcstats may play a role not being considered below):

# kstat -n arcstats | grep c_max
    c_max                           7498616832


# echo ::memstat | mdb -k
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      293573              1.1G   14%
ZFS Metadata                 28199            110.1M    1%
ZFS File Data               517332              1.9G   25%
Anon                        269994              1.0G   13%
Exec and libs                 6008             23.4M    0%
Page cache                  328957              1.2G   16%
Free (cachelist)              3779             14.7M    0%
Free (freelist)             628887              2.3G   30%
Total                      2096958              7.9G


# pagesize
4096


To quote the Solaris 11.2 documentation topic:
The suggested value is one-half of what you would like the host ZFS resources to use. For example, if you want ZFS to use less than 2 GB of memory, set the ARC cache to 1 GB, or 0x40000000.
Furthermore the Solaris 11.2 documentation on zfs_arc_max says:
75% of memory on systems with less than 4 GB of memory.
physmem minus 1 GB on systems with greater than 4 GB of memory.

If a future memory requirement is significantly large and well defined, you might consider reducing the value of this parameter to cap the ARC so that it does not compete with the memory requirement. For example, if you know that a future workload requires 20% of memory, it makes sense to cap the ARC such that it does not consume more than the remaining 80% of memory.
But in Solaris 11.3 things start to change a bit. There's a new tunable called user_reserve_hint_pct (from 0 to 99, defaulting to 0, also set in /etc/system as set user_reserve_hint_pct=...) intended to supersede zfs_arc_max.  About the new tunable, Solaris 11.3 documentation says:
Informs the system about how much memory is reserved for application use, and therefore limits how much memory can be used by the ZFS ARC cache as the cache increases over time.

By means of this parameter, administrators can maintain a large reserve of available free memory for future application demands. The user_reserve_hint_pct parameter is intended to be used in place of the zfs_arc_max parameter to restrict the growth of the ZFS ARC cache.

If a dedicated system is used to run a set of applications with a known memory footprint, set the parameter to the value of that footprint.

For upward adjustments, increase the value if the initial value is determined to be insufficient over time for application requirements, or if application demand increases on the system. Perform this adjustment only within a scheduled system maintenance window. After you have changed the value, reboot the system.

For downward adjustments, decrease the value if allowed by application requirements. Make sure to use decrease the value only by small amounts, no greater than 5% at a time.