Differences

This shows you the differences between two versions of the page.

--- linux:fs:zfs:issues:dva_0_has_invalid_offset [2021/05/10 17:57] – niziak
+++ linux:fs:zfs:issues:dva_0_has_invalid_offset [2021/05/24 19:10] (current) – niziak
@@ Line 6: / Line 6: @@
   * 2021-05-09" ''PANIC: rpool: blkptr at 00000000a44c5bb3 DVA 0 has invalid OFFSET 18388167655883276288''
-Stop scrub to prevent kernel error again:
-<code bash>
-zfs scrub -s rpool
-</code>
 DVA (DataVirtualAddress) is made up of:
   * a 32bit integer representing the VDEV
   * followed by a 63bit integer representing the offset.
+  *
+===== Important steps =====
+**Stop scrub to prevent error loop** (if scrub reads corrupted data, kernel will panic again):
+<code bash>
+zfs scrub -s rpool
+</code>
+**(Option) Turn ZFS PANIC into WARNING **
+<code bash>
+echo 1 > /sys/module/zfs/parameters/zfs_recover
+</code>
 Dump whole history of pool:
@@ Line 20: / Line 29: @@
 </code>
-  * ''-b'': Display statistics regarding the number, size (logical, physical and allocated) and deduplication of blocks.
+  * ''-bb'': Display statistics regarding the number, size (logical, physical and allocated) and deduplication of blocks. (Verbosity 2)
   * ''-c'': Verify the checksum of all metadata blocks while printing block statistics (see: ''-b''). If specified multiple times, verify the checksums of all blocks.
   * ''-s'': Report statistics on zdb I/O. Display operation counts, bandwidth, and error counts of I/O
   * ''-L'': Disable leak detection and the loading of space maps. By default, zdb verifies that all non-free blocks are referenced, which can be very expensive.
+  * ''-AAA'': Do not abort if asserts fail and also enable panic recovery.
 <code bash>
-zdb -bcsvL <zfs_pool>
+zdb -AAA -bbbcsvL <zfs_pool>
 </code>
+And sometimes it shows some errors during reading:
+<code>
+zdb_blkptr_cb: Got error 52 reading <259, 75932, 0, 17> DVA[0]=<0:158039e9000:6000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=20000L/6000P birth=62707L/62707P fill=1 cksum=516dd1ace1c:414cbfc202333b:af36411a2766c4f:7bc4d6777673687b -- skipping
+</code>
+====== Find problematic file / volume ======
+Try to read all ZFS volumes
+<code bash>
+zfs list
+</code>
+And for each volume, try to read it:
+<code bash>
+zfs send rpool/data/vm-703-disk-1 | pv > /dev/null
+</code>
+Catch PANIC and reboot system by sysrq to prevent IO lock.
+This problmatic volume is a replicated (received) volume from another ZFS node.
+<code>
+echo s > /proc/sysrq-trigger
+echo b > /proc/sysrq-trigger
+</code>
+After power up delete problematic zfs volume. During deletion PANIC happens again.
+Deletion is stored ZFS journal, so during mounting ZFS tries to replay pending deletion which cause PANIC again.
+System stuck in bootloop.
+  * Boot from Live USB with ZFS Support (Ubuntu has 0.8.3 ZOL).
+  * stop zfs-zed service
+  * ''rmmod zfs''
+  * ''modprobe zfs zfs_recover=1''
+It doesn't help. After hit ZFS warning insread of panic, ZFS informs about unrecoverable error and pool is suspended.
+Last possibility is to boot from Live system from USB and  copy all data to other zpool and recreate ''rpool'':
+<code bash>
+mkdir /rpool
+zpool import -f -R /rpool rpool -o readonly=on
+</code>
+  * ''-e'' operate on exported pool
+  * ''-L'' Disable leak detection and the loading of space maps.  By default, zdb verifies that all non-free blocks are referenced, which can be very expensive.
+<code bash>
+zdb -e -bcsvL rpool
+</code>
 ====== Resoruces ======
@@ Line 41: / Line 103: @@
 If bad data has a correct checksum, then at present ZFS cannot fix it. Sometimes it can recognize that the data is bad and report an error, sometimes it has no option but to panic, but sometimes it cannot even tell if it's bad data.
 </code>

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

Differences