OpenSUSE on ZFS

Due to a certain motherboard manufacturer having no quality control of their firmware updates, my previous home lab server had 4 of it's 12 DIMM slots fail. This necesitated a rapid replacement of the machine, as I use it heavily for work.

While frustrating, rather than lift my existing OpenSUSE Leap install to the new machine, I decided to reinstall - this time, with ZFS as the root filesystem. I have had enough issues with BTRFS performance and reliability to finally commit to moving away from it, even if that means I won't get to access snapper or transactional update. I'd rather my systems be reliable, than flakey and featureful.

But the catch here is that ZFS is not a supported root filesystem in OpenSUSE Leap. This means doing things the hardway. As a reformed Gentoo Ricer I'm not one to shy away from this challenge.

Existing Literature

I am not the first to attempt this process - I was heavily inspired by the guide that is hosted on the OpenZFS site for OpenSUSE Leap installs.

The steps for formatting NVME drives with the correct LBA size were from here

Setup

BootEnv

Currently it's not possible to setup OpenSUSE+ZFS from a live image. The best way to proceed is to setup an OpenSUSE Leap install on a separate disk first and use that as your install environment.

SecureBoot

I will be setting using EFI in this process. To avoid issues with kernel module signing, secure boot should be disabled. If you feel like getting the module signatures to work, then I'd love to hear how you did it.

Warning

WARNING: A lot of the commands that follow involve partitioning, formatting devices, and generally mucking about with your system in a way that can cause data loss. Ensure that any data you care about is backed up to external media and disconnected from the system during the install process.

Preparing the BootEnv

Boot your install environment, and if you prefer, SSH into it.

Enable the filesystems repo which has the ZFS modules.

zypper addrepo -f https://download.opensuse.org/repositories/filesystems/15.6/filesystems.repo

To prevent zypper doing silly things, we need to unfuck it.

# /etc/zypp/zypp.conf
## If you don't set this, zypper dl's one package at a time and it's so slow.
commit.downloadMode = DownloadInAdvance
## drpms are slower than full rpms.
download.use_deltarpm = false

# /etc/zypp/zypper.conf
[solver]
# You don't need these unless you want all of Xorg dragged in randomly
installRecommends = no

Refresh your repos and ensure you're up to date. If not, ZFS won't play nicely.

zypper ref --force
zypper dup
reboot

Getting Started

Boot your install environment, and if you prefer, SSH into it.

OPTIONAL: Setup tmux to shield yourself from connection drop outs.

zypper in tmux
tmux

Install the ZFS kernel module and partitioning tools

zypper in zfs zfs-kmp-default gptfdisk nvme-cli

Load the ZFS module

modprobe zfs

Partitioning

Identify your disks to ensure you are using the correct ones for the install.

lsblk

Identify your disk long names. These are used by ZFS.

ls -al /dev/disk/by-id/
# You need to look for the name -> dev in the output here

Setup an env var to make it easier for now.

export DISK=/dev/vda
export DISK=/dev/disk/by-id/nvme-WD_BLACK_XXXXX....

If the disk is an nvme device, format it to use the 4Kb LBA if possible.

nvme id-ns /dev/nvme1n1 | grep lbaf
# nlbaf   : 1
# nulbaf  : 0
# lbaf  0 : ms:0   lbads:9  rp:0x2
# lbaf  1 : ms:0   lbads:12 rp:0x1 (in use)
  • lbaf - is the LBA format table.
  • lbads - the "ashift". 9 == 512b, 12 == 4k, 13 == 8k

In this example lbaf 1 is the 4k format. To select it:

nvme format /dev/nvme1n1 -l 0

Destroy the partitions on the disk.

sgdisk --zap-all $DISK

Create EFI part

sgdisk -n1:1M:+512M -t1:EF00 $DISK

Create LVM for /boot and swap. At least 1G for /boot, and then choose what you need for swap. Ensure you have extra for LVM raid, about 1G. This example creates 8G for the PV, allowing a 1G boot and 6G swap.

sgdisk -n2:0:+8G -t2:8E00 $DISK

Create for the root pool. replace +X with 0 to use all space.

sgdisk -n3:0:0 -t3:BF01 $DISK

HINT: Repeat this with another $DISK if you want to setup RAID1

Format Time

Setup env-vars for your disks. I did a mirrored install. Add more or less mirrors to suit your tasts.

export DISK1_EFI=/dev/vda1
export DISK2_EFI=/dev/vdb1

export DISK1_BOOT=/dev/vda2
export DISK2_BOOT=/dev/vdb2

export DISK1_POOL=/dev/vda3
export DISK2_POOL=/dev/vdb3
export DISK1_EFI=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part1
export DISK2_EFI=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part1

export DISK1_BOOT=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part2
export DISK2_BOOT=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part2

export DISK1_POOL=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part3
export DISK2_POOL=/dev/disk/by-id/nvme-WD_BLACK_SN850X_XXX-part3

Create /boot/efi

mkfs.fat -s 1 -F 32 $DISK1_EFI
mkfs.fat -s 1 -F 32 $DISK2_EFI

Setup LVM PV

pvcreate $DISK1_BOOT
pvcreate $DISK2_BOOT

Create LVM VG

vgcreate vg_system $DISK1_BOOT $DISK2_BOOT

Create /boot

lvcreate -n lv_boot --type raid1 -L 1G vg_system
mkfs.xfs /dev/vg_system/lv_boot

Create swap

lvcreate -n lv_swap --type raid1 -L 6G vg_system
mkswap /dev/vg_system/lv_swap

Create ZFS pool

HINT: Recordsize here defines the max recordsize. For random IO's we'll set a smaller size, for anything that changes infrequently and we want better read performance, we'll set a larger value.

IMPORTANT: Ensure you set the ashift value to match your LBA size. 9 == 512b, 12 == 4k, 13 == 8k.

NOTE: We have to setup grub2 compat here, because grub is stupid and won't allow you to setup EFI if your root pool isn't grub2 compatible - even though we don't even have /boot on ZFS!!!

Thanks Grub. Thrub.

zpool create \
    -o cachefile=/etc/zfs/zpool.cache \
    -o ashift=12 \
    -o compatibility=grub2 \
    -O atime=off \
    -O xattr=sa -O mountpoint=none \
    -O acltype=posixacl -O canmount=off -O compression=lz4 \
    -O normalization=formD \
    -R /mnt \
    rpool mirror $DISK1_POOL $DISK2_POOL

Create all our ZFS filesystems within the pool

zfs create -o recordsize=128k -o canmount=noauto -o mountpoint=/ rpool/ROOT
zfs mount rpool/ROOT

zfs create -o com.sun:auto-snapshot=false  rpool/ROOT/tmp
chmod 1777 /mnt/tmp

zfs create  -o mountpoint=/home rpool/home
zfs create  -o mountpoint=/root rpool/home/root
chmod 700 /mnt/root

zfs create -o mountpoint=/var -o recordsize=4k  rpool/var
zfs create                                 rpool/var/lib
zfs create                                 rpool/var/log
zfs create                                 rpool/var/spool

zfs create -o com.sun:auto-snapshot=false  rpool/var/cache
zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp

zfs create -o recordsize=128k                rpool/var/lib/docker

Create /boot mountpoints

mkdir /mnt/boot
# Mount our /boot filesystem
mount /dev/vg_system/lv_boot /boot
# Make the efi mount points
mkdir /mnt/boot/efi
# Only if you have a second disk
mkdir /mnt/boot/efi-alt

# Mount the efi partitions
mount ${DISK1_EFI} /mnt/boot/efi
mount ${DISK2_EFI} /mnt/boot/efi-alt

Copy in the ZFS pool cache to the new system

mkdir /mnt/etc/zfs -p
cp /etc/zfs/zpool.cache /mnt/etc/zfs/

Begin the Install

Add zypper repos to the new chroot

zypper --root /mnt ar -f http://download.opensuse.org/distribution/leap/15.6/repo/non-oss  non-oss
zypper --root /mnt ar -f http://download.opensuse.org/distribution/leap/15.6/repo/oss oss
zypper --root /mnt ar -f http://download.opensuse.org/update/leap/15.6/oss  update-oss
zypper --root /mnt ar -f http://download.opensuse.org/update/leap/15.6/non-oss update-non-oss

Refresh repo data

zypper --root /mnt refresh --force

Install the basesystem to the chroot

zypper --root /mnt install -t pattern base

Install basic system tools

zypper --root /mnt refresh --force
zypper --root /mnt install zypper tmux vim zsh nmap iputils iproute2 tcpdump bridge-utils sudo htop ntpd-rs less grep findutils system-group-wheel tuned

Setup Your New System

Set a hostname

echo my.sweet.hostname > /mnt/etc/hostname

Create your filesystem mount table

cat << EOF > /mnt/etc/fstab
/dev/vg_system/lv_boot /boot xfs defaults 0 0
/dev/vg_system/lv_swap swap swap defaults 0 0
UUID=$(blkid -s UUID -o value ${DISK1_EFI}) /boot/efi vfat defaults 0 0
UUID=$(blkid -s UUID -o value ${DISK2_EFI}) /boot/efi-alt vfat defaults 0 0
EOF

Copy DNS config from host

rm /mnt/etc/resolv.conf
cp /etc/resolv.conf /mnt/etc/

Setup bind mounts for the chroot

mount --make-private --rbind /dev  /mnt/dev
mount --make-private --rbind /proc /mnt/proc
mount --make-private --rbind /sys  /mnt/sys
mount -t tmpfs tmpfs /mnt/run
mkdir /mnt/run/lock

hacker voice "I'm in".

chroot /mnt zsh --login

Unfuck zypper again

# Unfuck zypper again

# /etc/zypp/zypp.conf
commit.downloadMode = DownloadInAdvance
download.use_deltarpm = false

# /etc/zypp/zypper.conf
[solver]
installRecommends = no

Ensure the filesystems repo exists

zypper addrepo -f https://download.opensuse.org/repositories/filesystems/15.6/filesystems.repo

Fix the releasever variable in repos

sed -i -E 's/15\.6/$releasever/g' /etc/zypp/repos.d/*.repo
zypper refresh --force

Set a locale

# List of locales available.
locale -a
# Set the locale
localectl set-locale LANG=en_AU.UTF-8

Reinstall some base-packages that need it

zypper install -f permissions-config iputils ca-certificates  ca-certificates-mozilla pam shadow libutempter0 suse-module-tools util-linux

Install Your New Kernel and Bootloader

Install your kernel

zypper install kernel-default kernel-firmware lsb-release zfs zfs-kmp-default lvm2 grub2-x86_64-efi dosfstools man dmraid grub2-x86_64-efi-extras memtest86+

Setup your ZFS hostid

zgenhostid
hostid

Configure the kernel modules

echo 'zfs'>> /etc/modules-load.d/zfs.conf

echo "allow_unsupported_modules 1" > /etc/modprobe.d/10-unsupported-modules.conf

# If needed, configure options for modprobe in /etc/modprobe.d/zfs.conf

Configure GRUB2

grub2-probe /boot
# Must show xfs

Edit grub options

# /etc/default/grub

SUSE_REMOVE_LINUX_ROOT_PARAM=true
GRUB_CMDLINE_LINUX_DEFAULT="root=ZFS=rpool/ROOT preempt=full  "
GRUB_TERMINAL=console
GRUB_DISABLE_OS_PROBER=false

Create the grub configuration

grub2-mkconfig -o /boot/grub2/grub.cfg

Check that /boot/grub2/grub.cfg has root=ZFS=rpool/ROOT on the kernel commandline

Install EFI

grub2-install --target=x86_64-efi --efi-directory=/boot/efi \
    --bootloader-id=opensuse --recheck --no-floppy

grub2-install --target=x86_64-efi --efi-directory=/boot/efi-alt \
    --bootloader-id=opensuse --recheck --no-floppy

Final Configuration

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/rpool
zed -F &

While zed is running, we need to for it to update the rpool cache.

# NO-OP
zfs set canmount=noauto rpool/ROOT

Check that the rpool file now is populated

cat /etc/zfs/zfs-list.cache/rpool

Cancel zed, by foregrounding it and terminating it

fg
> ctrl -c

Clean up the rpool content

sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*
cat /etc/zfs/zfs-list.cache/rpool

Setup firstboot network

HINT: Check your device name ip with ip link

# /etc/sysconfig/network/ifcfg-eth0
BOOTPROTO='dhcp'
STARTMODE='auto'

Set the root password

passwd

Reboot Into the System

Exit the chroot

exit

Cleanly unmount

mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
    xargs -i{} umount -lf {}
zpool export -a -f
# If export fails, rm the zpool cache to prevent the root importing on the
# bootenv host.
rm /etc/zfs/zpool.cache

Reboot

reboot

First run

Enable and run anything you want!