Thursday, October 5, 2017

ZFS basic mirroring

Mirroring is a traditional strategy in providing fault-tolerance which became more popular for secondary storage systems, typically for the hard-disks. ZFS improves the strategy by introducing checksums to prevent that eventually corrupted data (due to bit-rot or some other component malfunction) at one side of the mirror gets replicated to the other healthy side of the mirror. This ZFS enhancement has been unique since in general it seems not viable to implement it solely at the physical layer (controller) as it may have dependencies at the logical layer (file-systems). ZFS achieves its goal by abstracting the physical layer into storage pools over which logical datasets (file-systems ans raw volumes) are managed.

Establishing mirrors within storage pools is a relatively simple task, specially in more recent versions of Solaris such as the Solaris 11.x. But in late Solaris 10 U1x as well as in Solaris 11 Express some initial disk preparation was required. In addition, for root pools under these older systems, it was necessary to manually install (via installboot(1M) or installgrub(1M)) the boot-loader on new disks just integrated into a mirror. On more recent versions of Solaris the bool-loader management for mirrored root pools were automated yet eventually manually manageable via the install-bootloader sub-command of bootadm(1M).

Another usual difference contrasting older systems (Solaris 10 U1x and Solaris 11 Express) from newer Solaris 11.x is as how underlying disks comprising storage pools are seem with respect to disk-labeling: SMI (VTOC) for older systems and disks and EFI (GPT) for newer ones. The most important implications about these two types of label is that SMI labels impose a limit of 2 TB of usable storage even on larger disks and are the only supported label for root pools under older systems. Typically, referring to whole-disks (which are preferred for ZFS over legacy slices/partitions), when a SMI label is used, disks names take the form c?[t?]d?s0 , otherwise they lack the trailing s0.

Here is some exemplification on how to hassle-free successfully establish a basic mirror on storage pools initially consisting on a single disk:


1) When SMI-labeled disks are required for a pool:

This is typical for older systems in general, for some SPARC systems or for not-so-old systems that don't yet support EFI devices on the root pool.

I assume that the disks were already appropriately prepared.

# zpool status
  pool: rpool
 state: ONLINE
 scan: ...
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          c8t0d0s0    ONLINE       0     0     0

errors: No known data errors


# zpool attach -f rpool c8t0d0s0 c8t1d0s0
Make sure to wait until resilver is done before rebooting.


# zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  

        The pool will continue to function, 
        possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since ...
    1.50G scanned out of 3.78G at ...M/s, 0h2m to go
    1.50G resilvered, 39.70% done
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t0d0s0  ONLINE       0     0     0
            c8t1d0s0  ONLINE       0     0     0  (resilvering)

errors: No known data errors


As this is a root pool, when the resilver is complete, one can optionally make sure the boot-loader is properly installed on the newly attached disk as well. But according to the official documentation, this extra step is only mandatory when a zpool replace command is issued on the root pool. For an i86pc system, if one decide so, the command would be similar to:

# installgrub \
  /boot/grub/stage1 /boot/groub/stage2 \
  /dev/rdsk/c8t1d0s0

or the newer and far superior:

# bootadm install-bootloader


2) Systems supporting EFI-labeled disks for any kind of pool:

This is good news as no tedious disk preparation is required beforehand at all and moreover is totally useless as during an attachment the disk will be automatically formatted and labeled as necessary and accordingly.

Therefore, the attachment procedure is as simple as:

# zpool attach rpool c1t0d0 c1t1d0

NOTE
It's possible to have N disks in a mirror which means the mirror will withstand as many as N-1 members failing at a given time. This may seem highly exaggerated at first but it may make sense on some scenarios.

But let me exclude the case of a 3-way mirror for a root pool with over 2 TB disks as an insane case: a root pool should really never require that much space justifying a 3rd member preventing a double-fault while resilvering from a single-fault.

For instance, a N-way mirror (N>3) for a non-root critical pool may make sense and be an straightforward solution if one intends to keep critical data replicated at N-2 remote locations. The mirrored devices (not disks) forming this pool could be iSCSI LUNs from separate remote storage facilities (preferably also backed by ZFS) as long as each LUN isn't comprised of many individual disks and as long as the pool also keeps local log and cache devices indispensable for better equalizing disparate remote storage performances and link latencies.
NOTE
Mirrors can be created or added right from the start with a single command, such as:

# zpool create hq \
  mirror c0t0d0
c1t0d0 \
  mirror
c0t1d0 c2t0d0

(each mirror above will resist a single disk and controller failure)
(it's similar to RAID-10 but RAID-Z(1) could rival if I/O block is over 128KB)

# zpool add hq \
  mirror
c1t1d0 c2t1d0

(the hq pool above is now stripping over 3 2-way mirrors)
(a better solution could be a RAID-Z2 scheme depending on block size)
 
Each mirror on the example above is known as a vdev.
Not surprisingly, ZFS stripes I/O along the top-level vdevs.
By the way, root pools support just 1 mirror vdev.

To remove a device from a mirror:
# zpool detach rpool c1t1d0

To replace a device in a mirror:
# zpool replace rpool c1t1d0 c1t2d0

And that seems the pretty much basics.