meta data for this page
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| linux:fs:zfs:tuning [2025/01/04 18:14] – niziak | linux:fs:zfs:tuning [2026/03/20 07:51] (current) – niziak | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== ZFS performance tuning tips ====== | ====== ZFS performance tuning tips ====== | ||
| - | ===== Tune L2ARC for backups ===== | + | Copy-paste snippet: |
| + | <code bash> | ||
| + | zfs set recordsize=1M rpool | ||
| + | zfs set recordsize=16M hddpool | ||
| + | zfs set recordsize=1M nvmpool | ||
| + | zfs set compression=zstd rpool | ||
| + | zfs set compression=zstd hddpool | ||
| + | zfs set compression=zstd nvmpool | ||
| + | </ | ||
| - | When huge portion of data are written (new backups) or read (backup verify) L2ARC is constantly written with current data. | + | **Note:** '' |
| - | To change this behaviour to cache only '' | + | See more in [[linux: |
| - | < | + | |
| - | options zfs l2arc_mfuonly=1 l2arc_noprefetch=0 | + | ===== zil limit ===== |
| + | |||
| + | ZFS parameter [[https:// | ||
| + | |||
| + | < | ||
| + | options zfs zil_slog_bulk=67108864 | ||
| + | options zfs l2arc_write_max=67108864 | ||
| </ | </ | ||
| - | Explanation: | + | See similar for L2ARC: [[https:// |
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | ===== I/O scheduler | + | ===== recordsize |
| + | Size must be power of 2. | ||
| - | If whole device is managed by ZFS (not partition), ZFS sets scheduler to '' | + | * ZFS file system: '' |
| + | * ZVOL block device: | ||
| - | ==== official recommendation ==== | + | This is basic operation unit for ZFS. ZFS is COW filesystem. So to modify even one byte of data stored inside 128kB record it must read 128kB record, modify it and store 128kB in new place. It creates huge read and write amplification. |
| - | For rotational devices, there is no sense to use advanced schedulers '' | + | Small sizes: |
| - | Both depends on processes, processes groups and application. In this case there is group of kernel processess for ZFS. | + | |
| - | Only possible scheduler to consider | + | * are good for dedicated workloads, like databases, etc. |
| - | '' | + | * 4kB has no sense with compression. Even if data is compressed below 4kB it still occupies smallest possible unit - 4kB. |
| + | * slower sequential read - lots of IOPS and checksum checks | ||
| + | * fragmentation over time | ||
| + | * metadata overhead | ||
| - | There is a discussion on OpenZFS project to do not touch schedulers anymore and let it to be configured by admin: | + | Big size: |
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | ==== my findings ==== | + | * benefit from compression. I.e: 128kB data compressed to 16kB will create 16kB record size. |
| + | * very good for storage (write once data) | ||
| + | * read/write amplification for small read/ | ||
| + | * good for sequential access | ||
| + | * good for HDDs (less fragmentation | ||
| + | * less metadata | ||
| + | * less fragmentation | ||
| + | * zvol: huge overhead if guest is using small block sizes - try to match guest FS block size to volblock - do not set 4kB volblock size ! | ||
| - | There is huge benefit to use '' | + | Note: '' |
| - | No more huge lags during KVM backups. | + | |
| - | '' | + | Examples: |
| - | * kernel '' | + | |
| - | * kvm processes have prio '' | + | |
| - | * kvm process during vzdump have '' | + | |
| - | ===== HDD ===== | + | * 16kB for MySQL/ |
| + | * 128kB for rotational HDDs | ||
| - | [[https:// | + | Check real usage by histogram: |
| <code bash> | <code bash> | ||
| - | cat / | + | zpool iostat -r |
| - | echo 2 > / | + | |
| - | cat / | + | |
| - | </ | + | |
| - | Use huge record size - it can help on SMR drives. Note: it only make sense for ZFS file system. Cannot be applied on ZVOL. | ||
| - | <code bash> | ||
| - | zfs set recordsize=1M hddpool/ | ||
| - | zfs set recordsize=1M hddpool/vz | ||
| </ | </ | ||
| - | NOTE: SMR drives behaves correctly | + | ===== zvol for guest ===== |
| + | * match volblock size to guest block size | ||
| + | * do not use guest CoW filesystem on CoW (ZFS) | ||
| + | * do not use qcow2 files on ZFS | ||
| + | * use 2 zvols per guest FS - one for storage and second one for journal | ||
| - | For ZVOLs: [[https:// | + | ===== Tune L2ARC for backups ===== |
| - | [[https:// | + | When huge portion of data are written (new backups) or read (backup verify) L2ARC is constantly written with current data. To change |
| - | Note: is no stripping is used (simple mirror) volblocksize should be 4kB (or at least the same as ashift) | + | <file conf / |
| + | options zfs l2arc_mfuonly=1 l2arc_noprefetch=0 | ||
| - | <code bash> | ||
| - | zfs create -s -V 40G hddpool/ | ||
| - | dd if=/ | ||
| - | zfs rename hddpool/ | ||
| - | zfs rename hddpool/ | ||
| - | </ | ||
| + | </ | ||
| + | |||
| + | Explanation: | ||
| + | |||
| + | * [[https:// | ||
| + | * [[https:// | ||
| - | Use '' | ||
| Line 88: | Line 104: | ||
| # ONLY for SSD/NVM devices: | # ONLY for SSD/NVM devices: | ||
| zfs set logbias=throughput < | zfs set logbias=throughput < | ||
| + | |||
| + | |||
| </ | </ | ||
| Line 93: | Line 111: | ||
| By default ZFS can sue 50% of RAM for ARC cache: | By default ZFS can sue 50% of RAM for ARC cache: | ||
| + | |||
| <code bash> | <code bash> | ||
| # apt install zfsutils-linux | # apt install zfsutils-linux | ||
| - | # arcstat | + | # zarcstat |
| time read miss miss% dmis dm% pmis pm% mmis mm% size | time read miss miss% dmis dm% pmis pm% mmis mm% size | ||
| 16: | 16: | ||
| Line 102: | Line 121: | ||
| <code bash> | <code bash> | ||
| - | # arc_summary | + | # zarcsummary -s arc |
| ARC size (current): | ARC size (current): | ||
| Line 116: | Line 135: | ||
| </ | </ | ||
| - | ARC size can be tuned by settings '' | + | ARC size can be tuned by settings '' |
| * '' | * '' | ||
| - | * '' | + | * '' |
| - | * '' | + | * '' |
| Proxmox recommends following [[https:// | Proxmox recommends following [[https:// | ||
| + | < | ||
| - | | + | As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/ |
| + | |||
| + | </ | ||
| ==== Examples ==== | ==== Examples ==== | ||
| - | | + | |
| - | Set '' | + | Set '' |
| <code bash> | <code bash> | ||
| echo "$[4 * 1024*1024*1024]" | echo "$[4 * 1024*1024*1024]" | ||
| Line 134: | Line 158: | ||
| Make options persistent: | Make options persistent: | ||
| - | <file / | + | |
| + | <code etcmodprobedzfsconf> | ||
| options zfs zfs_prefetch_disable=1 | options zfs zfs_prefetch_disable=1 | ||
| options zfs zfs_arc_max=4294967296 | options zfs zfs_arc_max=4294967296 | ||
| options zfs zfs_arc_min=134217728 | options zfs zfs_arc_min=134217728 | ||
| options zfs zfs_arc_meta_limit_percent=75 | options zfs zfs_arc_meta_limit_percent=75 | ||
| - | </file> | + | </code> |
| and '' | and '' | ||