Copy-paste snippet:
zfs set recordsize=1M rpool zfs set recordsize=16M hddpool zfs set recordsize=1M nvmpool zfs set compression=zstd rpool zfs set compression=zstd hddpool zfs set compression=zstd nvmpool
Note: zstd means zstd-3. It is still CPU hungry compression, and it is visible on top monitoring. For high workloads like build nodes use lz4
See more in ZFS compresison
ZFS parameter zil_slog_bulk is responsible to throttle LOG device load. In older ZFS valu was set to 768kB, currently it is 64MB. All sync write requests above this size will be treated as async requests and written directly to slower main device.
options zfs zil_slog_bulk=67108864 options zfs l2arc_write_max=67108864
See similar for L2ARC: l2arc_write_max
Size must be power of 2.
recordsize (default 128kB)volblocksize (default - Solaris based - was 8kB). With OpenZFS 2.2 default was changed to 16k to “reduce wasted space”.This is basic operation unit for ZFS. ZFS is COW filesystem. So to modify even one byte of data stored inside 128kB record it must read 128kB record, modify it and store 128kB in new place. It creates huge read and write amplification.
Small sizes:
Big size:
Note: recordsize / volblocksize only defines upper limit. Smaller data still can create smaller recordsize (is it true for block?).
Examples:
Check real usage by histogram:
zpool iostat -r
When huge portion of data are written (new backups) or read (backup verify) L2ARC is constantly written with current data. To change this behaviour to cache only Most Frequent Use:
options zfs l2arc_mfuonly=1 l2arc_noprefetch=0
Explanation:
See Archlinux wiki: Databases
zfs set recordsize=8K <pool>/postgres # ONLY for SSD/NVM devices: zfs set logbias=throughput <pool>/postgres
By default ZFS can sue 50% of RAM for ARC cache:
# apt install zfsutils-linux # zarcstat time read miss miss% dmis dm% pmis pm% mmis mm% size c avail 16:47:26 3 0 0 0 0 0 0 0 0 15G 15G 1.8G
# zarcsummary -s arc ARC size (current): 98.9 % 15.5 GiB Target size (adaptive): 100.0 % 15.6 GiB Min size (hard limit): 6.2 % 999.6 MiB Max size (high water): 16:1 15.6 GiB Most Frequently Used (MFU) cache size: 75.5 % 11.2 GiB Most Recently Used (MRU) cache size: 24.5 % 3.6 GiB Metadata cache size (hard limit): 75.0 % 11.7 GiB Metadata cache size (current): 8.9 % 1.0 GiB Dnode cache size (hard limit): 10.0 % 1.2 GiB Dnode cache size (current): 5.3 % 63.7 MiB
ARC size can be tuned by settings zfs kernel module parameters (Module Parameters):
zfs_arc_max: Maximum size of ARC in bytes. If set to 0 then the maximum size of ARC is determined by the amount of system memory installed (50% on Linux)zfs_arc_min: Minimum ARC size limit. When the ARC is asked to shrink, it will stop shrinking at c_min as tuned by zfs_arc_min.zfs_arc_meta_balance: Balance between metadata and data on ghost hits. Values above 100 increase metadata caching by proportionally reducing effect of ghost data hits on target data/metadata rate. https://openzfs.github.io/openzfs-docs/man/master/4/zfs.4.html#zfs_arc_meta_balanceProxmox recommends following rule:
As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage
Set zfs_arc_max to 4GB and zfs_arc_min to 128MB:
echo "$[4 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max echo "$[128 *1024*1024]" >/sys/module/zfs/parameters/zfs_arc_min
Make options persistent:
options zfs zfs_prefetch_disable=1 options zfs zfs_arc_max=4294967296 options zfs zfs_arc_min=134217728 options zfs zfs_arc_meta_limit_percent=75
and update-initramfs -u