Due to a certain motherboard manufacturer having no quality control of their firmware updates, my previous home lab server had 4 of it's 12 DIMM slots fail. This necesitated a rapid replacement of the machine, as I use it heavily for work.
While frustrating, rather than lift my existing OpenSUSE Leap install to the new machine, I decided to reinstall - this time, with ZFS as the root filesystem. I have had enough issues with BTRFS performance and reliability to finally commit to moving away from it, even if that means I won't get to access snapper or transactional update. I'd rather my systems be reliable, than flakey and featureful.
But the catch here is that ZFS is not a supported root filesystem in OpenSUSE Leap. This means doing things the hardway. As a reformed Gentoo Ricer I'm not one to shy away from this challenge.
Existing Literature
I am not the first to attempt this process - I was heavily inspired by the guide that is hosted on the OpenZFS site for OpenSUSE Leap installs.
The steps for formatting NVME drives with the correct LBA size were from here
Setup
BootEnv
Currently it's not possible to setup OpenSUSE+ZFS from a live image. The best way to proceed is to setup an OpenSUSE Leap install on a separate disk first and use that as your install environment.
SecureBoot
I will be setting using EFI in this process. To avoid issues with kernel module signing, secure boot should be disabled. If you feel like getting the module signatures to work, then I'd love to hear how you did it.
Warning
WARNING: A lot of the commands that follow involve partitioning, formatting devices, and generally mucking about with your system in a way that can cause data loss. Ensure that any data you care about is backed up to external media and disconnected from the system during the install process.
Preparing the BootEnv
Boot your install environment, and if you prefer, SSH into it.
Enable the filesystems repo which has the ZFS modules.
zypper addrepo -f https://download.opensuse.org/repositories/filesystems/15.6/filesystems.repo
To prevent zypper doing silly things, we need to unfuck it.
# /etc/zypp/zypp.conf
## If you don't set this, zypper dl's one package at a time and it's so slow.
commit.downloadMode = DownloadInAdvance
## drpms are slower than full rpms.
download.use_deltarpm = false
# /etc/zypp/zypper.conf
[solver]
# You don't need these unless you want all of Xorg dragged in randomly
installRecommends = no
Refresh your repos and ensure you're up to date. If not, ZFS won't play nicely.
zypper ref --force
zypper dup
reboot
Getting Started
Boot your install environment, and if you prefer, SSH into it.
OPTIONAL: Setup tmux to shield yourself from connection drop outs.
zypper in tmux
tmux
Install the ZFS kernel module and partitioning tools
zypper in zfs zfs-kmp-default gptfdisk nvme-cli
Load the ZFS module
modprobe zfs
Partitioning
Identify your disks to ensure you are using the correct ones for the install.
lsblk
Identify your disk long names. These are used by ZFS.
ls -al /dev/disk/by-id/
# You need to look for the name -> dev in the output here
Setup an env var to make it easier for now.
export DISK=/dev/vda
export DISK=/dev/disk/by-id/nvme-WD_BLACK_XXXXX....
If the disk is an nvme device, format it to use the 4Kb LBA if possible.
nvme id-ns /dev/nvme1n1 | grep lbaf
# nlbaf : 1
# nulbaf : 0
# lbaf 0 : ms:0 lbads:9 rp:0x2
# lbaf 1 : ms:0 lbads:12 rp:0x1 (in use)
- lbaf - is the LBA format table.
- lbads - the "ashift". 9 == 512b, 12 == 4k, 13 == 8k
In this example lbaf 1 is the 4k format. To select it:
nvme format /dev/nvme1n1 -l 0
Destroy the partitions on the disk.
sgdisk --zap-all $DISK
Create EFI part
sgdisk -n1:1M:+512M -t1:EF00 $DISK
Create LVM for /boot and swap. At least 1G for /boot, and then choose what you need for swap. Ensure you have extra for LVM raid, about 1G. This example creates 8G for the PV, allowing a 1G boot and 6G swap.
sgdisk -n2:0:+8G -t2:8E00 $DISK
Create for the root pool. replace +X with 0 to use all space.
sgdisk -n3:0:0 -t3:BF01 $DISK
HINT: Repeat this with another $DISK if you want to setup RAID1
Format Time
Setup env-vars for your disks. I did a mirrored install. Add more or less mirrors to suit your tasts.
export DISK1_EFI=/dev/vda1
export DISK2_EFI=/dev/vdb1
export DISK1_BOOT=/dev/vda2
export DISK2_BOOT=/dev/vdb2
export DISK1_POOL=/dev/vda3
export DISK2_POOL=/dev/vdb3
export DISK1_EFI=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part1
export DISK2_EFI=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part1
export DISK1_BOOT=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part2
export DISK2_BOOT=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part2
export DISK1_POOL=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part3
export DISK2_POOL=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part3
Create /boot/efi
mkfs.fat -s 1 -F 32 $DISK1_EFI
mkfs.fat -s 1 -F 32 $DISK2_EFI
Setup LVM PV
pvcreate $DISK1_BOOT
pvcreate $DISK2_BOOT
Create LVM VG
vgcreate vg_system $DISK1_BOOT $DISK2_BOOT
Create /boot
lvcreate -n lv_boot --type raid1 -L 1G vg_system
mkfs.xfs /dev/vg_system/lv_boot
Create swap
lvcreate -n lv_swap --type raid1 -L 6G vg_system
mkswap /dev/vg_system/lv_swap
Create ZFS pool
HINT: Recordsize here defines the max recordsize. For random IO's we'll set a smaller size, for anything that changes infrequently and we want better read performance, we'll set a larger value.
IMPORTANT: Ensure you set the ashift value to match your LBA size. 9 == 512b, 12 == 4k, 13 == 8k.
NOTE: We have to setup grub2 compat here, because grub is stupid and won't allow you to setup EFI if your root pool isn't grub2 compatible - even though we don't even have /boot on ZFS!!!
Thanks Grub. Thrub.
zpool create \
-o cachefile=/etc/zfs/zpool.cache \
-o ashift=12 \
-o compatibility=grub2 \
-O atime=off \
-O xattr=sa -O mountpoint=none \
-O acltype=posixacl -O canmount=off -O compression=lz4 \
-O normalization=formD \
-R /mnt \
rpool mirror $DISK1_POOL $DISK2_POOL
Create all our ZFS filesystems within the pool
zfs create -o recordsize=128k -o canmount=noauto -o mountpoint=/ rpool/ROOT
zfs mount rpool/ROOT
zfs create -o com.sun:auto-snapshot=false rpool/ROOT/tmp
chmod 1777 /mnt/tmp
zfs create -o mountpoint=/home rpool/home
zfs create -o mountpoint=/root rpool/home/root
chmod 700 /mnt/root
zfs create -o mountpoint=/var -o recordsize=4k rpool/var
zfs create rpool/var/lib
zfs create rpool/var/log
zfs create rpool/var/spool
zfs create -o com.sun:auto-snapshot=false rpool/var/cache
zfs create -o com.sun:auto-snapshot=false rpool/var/tmp
zfs create -o recordsize=128k rpool/var/lib/docker
Create /boot mountpoints
mkdir /mnt/boot
# Mount our /boot filesystem
mount /dev/vg_system/lv_boot /boot
# Make the efi mount points
mkdir /mnt/boot/efi
# Only if you have a second disk
mkdir /mnt/boot/efi-alt
# Mount the efi partitions
mount ${DISK1_EFI} /mnt/boot/efi
mount ${DISK2_EFI} /mnt/boot/efi-alt
Copy in the ZFS pool cache to the new system
mkdir /mnt/etc/zfs -p
cp /etc/zfs/zpool.cache /mnt/etc/zfs/
Begin the Install
Add zypper repos to the new chroot
zypper --root /mnt ar -f http://download.opensuse.org/distribution/leap/15.6/repo/non-oss non-oss
zypper --root /mnt ar -f http://download.opensuse.org/distribution/leap/15.6/repo/oss oss
zypper --root /mnt ar -f http://download.opensuse.org/update/leap/15.6/oss update-oss
zypper --root /mnt ar -f http://download.opensuse.org/update/leap/15.6/non-oss update-non-oss
Refresh repo data
zypper --root /mnt refresh --force
Install the basesystem to the chroot
zypper --root /mnt install -t pattern base
Install basic system tools
zypper --root /mnt refresh --force
zypper --root /mnt install zypper tmux vim zsh nmap iputils iproute2 tcpdump bridge-utils sudo htop ntpd-rs less grep findutils system-group-wheel tuned
Setup Your New System
Set a hostname
echo my.sweet.hostname > /mnt/etc/hostname
Create your filesystem mount table
cat << EOF > /mnt/etc/fstab
/dev/vg_system/lv_boot /boot xfs defaults 0 0
/dev/vg_system/lv_swap swap swap defaults 0 0
UUID=$(blkid -s UUID -o value ${DISK1_EFI}) /boot/efi vfat defaults 0 0
UUID=$(blkid -s UUID -o value ${DISK2_EFI}) /boot/efi-alt vfat defaults 0 0
EOF
Copy DNS config from host
rm /mnt/etc/resolv.conf
cp /etc/resolv.conf /mnt/etc/
Setup bind mounts for the chroot
mount --make-private --rbind /dev /mnt/dev
mount --make-private --rbind /proc /mnt/proc
mount --make-private --rbind /sys /mnt/sys
mount -t tmpfs tmpfs /mnt/run
mkdir /mnt/run/lock
hacker voice "I'm in".
chroot /mnt zsh --login
Unfuck zypper again
# Unfuck zypper again
# /etc/zypp/zypp.conf
commit.downloadMode = DownloadInAdvance
download.use_deltarpm = false
# /etc/zypp/zypper.conf
[solver]
installRecommends = no
Ensure the filesystems repo exists
zypper addrepo -f https://download.opensuse.org/repositories/filesystems/15.6/filesystems.repo
Fix the releasever variable in repos
sed -i -E 's/15\.6/$releasever/g' /etc/zypp/repos.d/*.repo
zypper refresh --force
Set a locale
# List of locales available.
locale -a
# Set the locale
localectl set-locale LANG=en_AU.UTF-8
Reinstall some base-packages that need it
zypper install -f permissions-config iputils ca-certificates ca-certificates-mozilla pam shadow libutempter0 suse-module-tools util-linux
Install Your New Kernel and Bootloader
Install your kernel
zypper install kernel-default kernel-firmware lsb-release zfs zfs-kmp-default lvm2 grub2-x86_64-efi dosfstools man dmraid grub2-x86_64-efi-extras memtest86+
Setup your ZFS hostid
zgenhostid
hostid
Configure the kernel modules
echo 'zfs'>> /etc/modules-load.d/zfs.conf
echo "allow_unsupported_modules 1" > /etc/modprobe.d/10-unsupported-modules.conf
# If needed, configure options for modprobe in /etc/modprobe.d/zfs.conf
Configure GRUB2
grub2-probe /boot
# Must show xfs
Edit grub options
# /etc/default/grub
SUSE_REMOVE_LINUX_ROOT_PARAM=true
GRUB_CMDLINE_LINUX_DEFAULT="root=ZFS=rpool/ROOT preempt=full "
GRUB_TERMINAL=console
GRUB_DISABLE_OS_PROBER=false
Create the grub configuration
grub2-mkconfig -o /boot/grub2/grub.cfg
Check that /boot/grub2/grub.cfg
has root=ZFS=rpool/ROOT
on the kernel commandline
Install EFI
grub2-install --target=x86_64-efi --efi-directory=/boot/efi \
--bootloader-id=opensuse --recheck --no-floppy
grub2-install --target=x86_64-efi --efi-directory=/boot/efi-alt \
--bootloader-id=opensuse --recheck --no-floppy
Final Configuration
mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/rpool
zed -F &
While zed is running, we need to for it to update the rpool cache.
# NO-OP
zfs set canmount=noauto rpool/ROOT
Check that the rpool file now is populated
cat /etc/zfs/zfs-list.cache/rpool
Cancel zed, by foregrounding it and terminating it
fg
> ctrl -c
Clean up the rpool content
sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*
cat /etc/zfs/zfs-list.cache/rpool
Setup firstboot network
HINT: Check your device name ip with
ip link
# /etc/sysconfig/network/ifcfg-eth0
BOOTPROTO='dhcp'
STARTMODE='auto'
Set the root password
passwd
Reboot Into the System
Exit the chroot
exit
Cleanly unmount
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
xargs -i{} umount -lf {}
zpool export -a -f
# If export fails, rm the zpool cache to prevent the root importing on the
# bootenv host.
rm /etc/zfs/zpool.cache
Reboot
reboot
First run
Enable and run anything you want!