meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
linux:fs:zfs:issues:dva_0_has_invalid_offset [2021/05/10 18:17] niziaklinux:fs:zfs:issues:dva_0_has_invalid_offset [2021/05/24 19:10] (current) niziak
Line 5: Line 5:
   * 2021-04-19: ''PANIC: rpool: blkptr at 000000009897c6f0 DVA 0 has invalid OFFSET 18388167655883276288''   * 2021-04-19: ''PANIC: rpool: blkptr at 000000009897c6f0 DVA 0 has invalid OFFSET 18388167655883276288''
   * 2021-05-09" ''PANIC: rpool: blkptr at 00000000a44c5bb3 DVA 0 has invalid OFFSET 18388167655883276288''   * 2021-05-09" ''PANIC: rpool: blkptr at 00000000a44c5bb3 DVA 0 has invalid OFFSET 18388167655883276288''
 +
 +
 +DVA (DataVirtualAddress) is made up of:
 +  * a 32bit integer representing the VDEV 
 +  * followed by a 63bit integer representing the offset.
 +  * 
 +===== Important steps =====
  
 **Stop scrub to prevent error loop** (if scrub reads corrupted data, kernel will panic again): **Stop scrub to prevent error loop** (if scrub reads corrupted data, kernel will panic again):
Line 11: Line 18:
 </code> </code>
  
-DVA (DataVirtualAddressis made up of: +**(OptionTurn ZFS PANIC into WARNING ** 
-  a 32bit integer representing the VDEV  +<code bash> 
-  followed by a 63bit integer representing the offset.+echo 1 > /sys/module/zfs/parameters/zfs_recover 
 +</code> 
  
 Dump whole history of pool: Dump whole history of pool:
Line 28: Line 37:
  
 <code bash> <code bash>
-zdb -AAA -bbcsvL <zfs_pool>+zdb -AAA -bbbcsvL <zfs_pool>
 </code> </code>
 +
 +And sometimes it shows some errors during reading:
 +
 +<code>
 +zdb_blkptr_cb: Got error 52 reading <259, 75932, 0, 17> DVA[0]=<0:158039e9000:6000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE contiguous unique single size=20000L/6000P birth=62707L/62707P fill=1 cksum=516dd1ace1c:414cbfc202333b:af36411a2766c4f:7bc4d6777673687b -- skipping
 +</code>
 +
 +
 +====== Find problematic file / volume ======
 +
 +Try to read all ZFS volumes
 +<code bash>
 +zfs list
 +</code>
 +
 +And for each volume, try to read it:
 +<code bash>
 +zfs send rpool/data/vm-703-disk-1 | pv > /dev/null
 +</code>
 +Catch PANIC and reboot system by sysrq to prevent IO lock.
 +This problmatic volume is a replicated (received) volume from another ZFS node.
 +<code>
 +echo s > /proc/sysrq-trigger 
 +echo b > /proc/sysrq-trigger 
 +</code>
 +
 +After power up delete problematic zfs volume. During deletion PANIC happens again.
 +Deletion is stored ZFS journal, so during mounting ZFS tries to replay pending deletion which cause PANIC again.
 +System stuck in bootloop.
 +
 +  * Boot from Live USB with ZFS Support (Ubuntu has 0.8.3 ZOL).
 +  * stop zfs-zed service
 +  * ''rmmod zfs''
 +  * ''modprobe zfs zfs_recover=1''
 +
 +It doesn't help. After hit ZFS warning insread of panic, ZFS informs about unrecoverable error and pool is suspended.
 +
 +Last possibility is to boot from Live system from USB and  copy all data to other zpool and recreate ''rpool'':
 +<code bash>
 +mkdir /rpool
 +zpool import -f -R /rpool rpool -o readonly=on
 +
 +</code>
 +
 +  * ''-e'' operate on exported pool
 +  * ''-L'' Disable leak detection and the loading of space maps.  By default, zdb verifies that all non-free blocks are referenced, which can be very expensive.
 +<code bash>
 +zdb -e -bcsvL rpool
 +</code>
 +
 +
  
 ====== Resoruces ====== ====== Resoruces ======
Line 43: Line 103:
 If bad data has a correct checksum, then at present ZFS cannot fix it. Sometimes it can recognize that the data is bad and report an error, sometimes it has no option but to panic, but sometimes it cannot even tell if it's bad data. If bad data has a correct checksum, then at present ZFS cannot fix it. Sometimes it can recognize that the data is bad and report an error, sometimes it has no option but to panic, but sometimes it cannot even tell if it's bad data.
 </code> </code>
 +