meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
vm:proxmox:disaster_recovery [2021/05/18 18:51] niziakvm:proxmox:disaster_recovery [2024/02/12 08:26] (current) niziak
Line 1: Line 1:
 ====== Disaster recovery ====== ====== Disaster recovery ======
 +
 +===== replace NVM device =====
 +
 +Only 1 NVM slot available, so idea is to copy nvm to hdd and then restore it on new nvm device.
 +
 +Stop CEPH:
 +<code bash>
 +systemctl stop ceph.target
 +systemctl stop ceph-osd.target
 +systemctl stop ceph-mgr.target
 +systemctl stop ceph-mon.target
 +systemctl stop ceph-mds.target
 +systemctl stop ceph-crash.service
 +</code>
 +
 +Backup partition layout
 +<code bash>
 +sgdisk -b nvm.sgdisk /dev/nvme0n1
 +sgdisk -p /dev/nvme0n1
 +</code>
 +
 +Move ZFS nvmpool to hdds:
 +<code bash>
 +zfs destroy hddpool/nvmtemp
 +zfs create -s -b 8192 -V 387.8G hddpool/nvmtemp  # not block size was forced to match existing device
 +
 +ls -l /dev/zvol/hddpool/nvmtemp
 +lrwxrwxrwx 1 root root 11 01-15 11:00 /dev/zvol/hddpool/nvmtemp -> ../../zd192
 +
 +zpool attach nvmpool 7b375b69-3ef9-c94b-bab5-ef68f13df47c /dev/zd192
 +</code>
 +And ''nvmpool'' resilvering will begin. Observe it with ''zpool status nvmpool 1''
 +
 +Remove NVM from ''nvmpool'':
 +<code bash>zpool detach nvmpool 7b375b69-3ef9-c94b-bab5-ef68f13df47c</code>
 +
 +Remove all ZILS, L2ARCs and swap:
 +<code bash>
 +swapoff -a
 +vi /etc/fstab
 +
 +zpool remove hddpool <ZIL DEVICE>
 +zpool remove hddpool <L2ARC DEVICE>
 +zpool remove rpool <L2ARC DEVICE>
 +</code>
 +
 +CEPH OSD will be created from scratch to force to rebuild OSD DB (which can be too big due to metadata bug from previous version of CEPH)
 +
 +Replace NVM.
 +
 +Recreate partitions or restore from backup <code bash>sgdisk -l nvm.sgdisk /dev/nvme0n1</code>
 +  * swap
 +  * rpool_zil
 +  * hddpool_zil
 +  * hddpool_l2arc
 +  * ceph_db (for 4GB ceph OSD create 4096MB+4MB)
 +
 +Add ZILs and L2ARCs.
 +
 +Start ''nvmpool'': <code bash>zpool import nvmpool</code>
 +
 +Move ''nvmpool'' to new NVM partition:
 +<code bash>
 +zpool attach nvmpool zd16 426718f1-1b1e-40c0-a6e2-1332fe5c3f2c
 +zpool detach nvmpool zd16
 +</code>
 +
 +===== Replace rpool device =====
 +
 +Proxmox rpool ZFS is located on 3rd partition (1st is Grub BOOT, 2nd is EFI, 3rd is ZFS).
 +To replace failed device it is needed to replicate partition layout:
 +
 +With new device of greater or equal size, simple replicate partitions:
 +<code bash>
 +# replicate layout from SDA to SDB
 +sgdisk /dev/sda -R /dev/sdb
 +# generate new UUIDs:
 +sgdisk -G /dev/sdb
 +</code>
 +
 +To replicate layout on smaller device, need manually create partitions:
 +<code bash>
 +sgdisk -p /dev/sda
 +
 +Number  Start (sector)    End (sector)  Size       Code  Name
 +                34            2047   1007.0 KiB  EF02  
 +              2048         1050623   512.0 MiB   EF00  
 +           1050624       976773134   465.3 GiB   BF01  
 +
 +sgdisk --clear /dev/sdb
 +sgdisk /dev/sdb -a1 --new 1:34:2047      -t0:EF02
 +sgdisk /dev/sdb     --new 2:2048:1050623 -t0:EF00
 +sgdisk /dev/sdb     --new 3:1050624      -t0:BF01
 +</code>
 +
 +Restore bootloader:
 +<code bash>
 +proxmox-boot-tool format /dev/sdb2
 +proxmox-boot-tool init /dev/sdb2
 +proxmox-boot-tool clean
 +</code>
 +
 +<code bash>
 +zpool attach rpool ata-SPCC_Solid_State_Disk_XXXXXXXXXXXX-part3 /dev/disk/by-id/ata-SSDPR-CL100-120-G3_XXXXXXXX-part3
 +zpool offline rpool ata-SSDPR-CX400-128-G2_XXXXXXXXX-part3
 +zpool detach rpool ata-SSDPR-CX400-128-G2_XXXXXXXXX-part3
 +</code>
  
 ===== Migrate VM from dead node ===== ===== Migrate VM from dead node =====