Table of Contents

ZFS performance tuning tips

Copy-paste snippet:

zfs set recordsize=1M rpool
zfs set recordsize=16M hddpool
zfs set recordsize=1M nvmpool
zfs set compression=zstd rpool
zfs set compression=zstd hddpool
zfs set compression=zstd nvmpool

Note: zstd means zstd-3. It is still CPU hungry compression, and it is visible on top monitoring. For high workloads like build nodes use lz4 See more in ZFS compresison

zil limit

ZFS parameter zil_slog_bulk is responsible to throttle LOG device load. In older ZFS valu was set to 768kB, currently it is 64MB. All sync write requests above this size will be treated as async requests and written directly to slower main device.

/etc/modprobe.d/zfs.conf
options zfs zil_slog_bulk=67108864
options zfs l2arc_write_max=67108864

See similar for L2ARC: l2arc_write_max

recordsize / volblocksize

Size must be power of 2.

This is basic operation unit for ZFS. ZFS is COW filesystem. So to modify even one byte of data stored inside 128kB record it must read 128kB record, modify it and store 128kB in new place. It creates huge read and write amplification.

Small sizes:

Big size:

Note: recordsize / volblocksize only defines upper limit. Smaller data still can create smaller recordsize (is it true for block?).

Examples:

Check real usage by histogram:

zpool iostat -r

zvol for guest

Tune L2ARC for backups

When huge portion of data are written (new backups) or read (backup verify) L2ARC is constantly written with current data. To change this behaviour to cache only Most Frequent Use:

/etc/modprobe.d/zfs.conf
options zfs l2arc_mfuonly=1 l2arc_noprefetch=0

Explanation:

Postgresql

See Archlinux wiki: Databases

zfs set recordsize=8K <pool>/postgres
 
# ONLY for SSD/NVM devices:
zfs set logbias=throughput <pool>/postgres

reduce ZFS ARC RAM usage

By default ZFS can sue 50% of RAM for ARC cache:

# apt install zfsutils-linux
 
# zarcstat
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  size     c  avail
16:47:26     3     0      0     0    0     0    0     0    0   15G   15G   1.8G
# zarcsummary -s arc
 
ARC size (current):                                    98.9 %   15.5 GiB
        Target size (adaptive):                       100.0 %   15.6 GiB
        Min size (hard limit):                          6.2 %  999.6 MiB
        Max size (high water):                           16:1   15.6 GiB
        Most Frequently Used (MFU) cache size:         75.5 %   11.2 GiB
        Most Recently Used (MRU) cache size:           24.5 %    3.6 GiB
        Metadata cache size (hard limit):              75.0 %   11.7 GiB
        Metadata cache size (current):                  8.9 %    1.0 GiB
        Dnode cache size (hard limit):                 10.0 %    1.2 GiB
        Dnode cache size (current):                     5.3 %   63.7 MiB

ARC size can be tuned by settings zfs kernel module parameters (Module Parameters):

Proxmox recommends following rule:

As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage

Examples

Set zfs_arc_max to 4GB and zfs_arc_min to 128MB:

echo "$[4 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max
echo "$[128     *1024*1024]" >/sys/module/zfs/parameters/zfs_arc_min

Make options persistent:

options zfs zfs_prefetch_disable=1
options zfs zfs_arc_max=4294967296
options zfs zfs_arc_min=134217728
options zfs zfs_arc_meta_limit_percent=75

and update-initramfs -u