Storage Administration Guide
This guide will help you understand, configure and maintain storage on Linux servers. The content of this guide is optimised for reliability and accesibility. This is based not only on my own experiences but observing the experiences of enterprise customers for many years.
⚠️ Warnings ⚠️
Making changes to storage entails risks. Linux and it's storage tools have no safety barriers. Mistakes can result in COMPLETE LOSS OF ALL YOUR DATA.
DO NOT COPY PASTE COMMANDS HERE WITHOUT UNDERSTANDING THEM.
CAREFULLY PLAN YOUR COMMANDS.
HAVE BACKUPS THAT YOU HAVE TESTED.
Almost all commands in this document require root privilieges.
This document is a work in progress!
General Advice
Before executing commands that will change your storage you should analyse your storage, make notes, and prepare your commands in a notepad before you execute the commands. This will allow you to review before making changes.
Understanding Your Storage
Before changing your storage configurations you need to understand what you have and what you may want to achieve.
List Storage On Your System
lsblk
is the most important command in your toolbox. It allows you to understand the layout and
set of your storage between making changes. It also allows you to check which disks are in use so
that you can avoid them while making changes.
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 372K 0 rom
vda 254:0 0 10G 0 disk
├─vda1 254:1 0 2M 0 part
├─vda2 254:2 0 33M 0 part /boot/efi
└─vda3 254:3 0 10G 0 part /
vdb 254:16 0 50G 0 disk
vdc 254:32 0 50G 0 disk
vdd 254:48 0 50G 0 disk
vde 254:64 0 50G 0 disk
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 372K 0 rom
vda 254:0 0 10G 0 disk <---------- this is a whole disk.
/-- these partitions exist on the disk.
├─vda1 254:1 0 2M 0 part - <- this partition is not mounted
├─vda2 254:2 0 33M 0 part /boot/efi | <- this partition is mounted at /boot/efi
└─vda3 254:3 0 10G 0 part / - <- this partition is mounted on /
vdb 254:16 0 50G 0 disk
vdc 254:32 0 50G 0 disk <---------- these disks have no partitions
vdd 254:48 0 50G 0 disk
vde 254:64 0 50G 0 disk
View Filesystems That Mount At Boot
Filesystems are mounted at boot from the FileSystem TABle. These are stored in /etc/fstab
.
# cat /etc/fstab
UUID=ef76b8e7-6017-4757-bd51-3e0e662d408b / xfs defaults 0 1
UUID=F6F5-05FB /boot/efi vfat defaults 0 0
This is arranged as a white-space separted table.
/- The path or identifier of the device to mount
|
| /- where to mount the filesystem
| |
| | /- the type of filesysetm on the device
| | |
| | | /- mount options for the filesystem to be mounted
| | | |
| | | | /- dump to tape on crash. set to 0
| | | | |
| | | | | /- order of filesystem checks at boot.
| | | | | | 0 = do not check
| | | | | | 1 = check first
| | | | | | 2 = check second ...
v v v v v v
UUID=ef76b8e7-6017-4757-bd51-3e0e662d408b / xfs defaults 0 1
/dev/disk/by-id/wwn-0x5001b448bd8e7de2 /mnt xfs defaults 0 0
Show Disk Path Or Identifiers
Linux allows disks to be referenced by different aliases that can be more accessible or unique
to help prevent mistakes. For example, vda
and sda
are very similar but virtio-pci-0000:04:00.0
or wwn-0x5000cca0bbefc231
are distinct and uniquely identify the device. In addition if you move
the device between different ports (e.g. changing sata ports, or sas ports) then some of these identifiers
will stay stable between those changes.
Show disks by identifiers
Disk by ID are a stable identifier that should not change between systems or connection of the device.
These are commonly used in ZFS pools or administration commands.
# ls -l /dev/disk/by-id
lrwxrwxrwx 1 root root 9 Sep 7 08:51 scsi-1ATA_WDC_WDS200T1R0A-68A4W0_223609A005A5 -> ../../sdg
lrwxrwxrwx 1 root root 10 Sep 7 08:51 scsi-1ATA_WDC_WDS200T1R0A-68A4W0_223609A005A5-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 Sep 7 08:51 scsi-1ATA_WDC_WDS200T1R0A-68A4W0_223609A005A5-part9 -> ../../sdg9
...
lrwxrwxrwx 1 root root 9 Sep 7 08:51 wwn-0x5001b448bd8e7de2 -> ../../sdg
lrwxrwxrwx 1 root root 10 Sep 7 08:51 wwn-0x5001b448bd8e7de2-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 Sep 7 08:51 wwn-0x5001b448bd8e7de2-part9 -> ../../sdg9
Show disks by UUID
Generally only partitions with filesystems will have a UUID - this will not show "devices".
UUID's are a stable identifier that should not change between systems or connection of the device.
These are commonly used in /etc/fstab
for mounting filesystems. These are used with
the fstab
syntax of UUID=< ... >
# ls -l /dev/disk/by-uuid
lrwxrwxrwx 1 root root 15 Sep 7 08:51 DA2C-4E2B -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 10 Sep 7 08:51 e3967de9-6cab-4387-8a58-aa6b34dba39f -> ../../dm-9
Show disks by their physical attachment path
These are commonly used to locate the type of device, and where it may be attached on a system.
# ls -l /dev/disk/by-path
lrwxrwxrwx 1 root root 9 Sep 7 08:51 pci-0000:00:17.0-ata-8.0 -> ../../sdg
lrwxrwxrwx 1 root root 10 Sep 7 08:51 pci-0000:00:17.0-ata-8.0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 Sep 7 08:51 pci-0000:00:17.0-ata-8.0-part9 -> ../../sdg9
lrwxrwxrwx 1 root root 13 Sep 7 08:51 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Sep 7 08:51 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Sep 7 08:51 pci-0000:01:00.0-nvme-1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Sep 7 08:51 pci-0000:01:00.0-nvme-1-part3 -> ../../nvme0n1p3
What Storage Setup Do I Want?
Here are some questions that may help you to decide how to configure the storage in your system.
I'm Installing My Favourite Distro On A Laptop/Workstation
- You should use GPT for partitioning.
- You should use LVM to allow resizing partitions, creating raid or changing disks in the future.
- Split Home vs Root OR combined Home + Root is a personal preference.
- If you want fast and highly reliable storage -> Choose XFS
- If you want features like snapshots -> Choose BTRFS
I'm Adding Non-Root Disks To My Workstation/Server
I Want Highly Reliable, Fault Tolerant Storage
- Use ZFS
I Can Not Afford To Lose Data Ever.
- Use ZFS
I Want One Giant Pool Of Storage That I Will Expand In Future
- Use ZFS
I Plan To Add/Remove/Change Disks Again When Ever I Feel Like
- Use LVM+RAID
I Just Want Disks Mounted Like A Chaos Goblin
- You should use GPT for partitioning.
- You should use LVM to allow resizing partitions, creating raid or changing disks in the future.
- If you want fast and highly reliable storage -> Choose XFS
- If you want features like snapshots -> Choose BTRFS
Managing Single Disks
Reload partition tables
After making changes to partition tables, you may need to force the kernel to re-read them so that
they are reflected in commands like lsblk
# partprobe
ZFS
todo!();
LVM
LVM is a Logical Volume Manager for Linux. It allows dynamically changing storage without downtime, and the ability to warp and shape into complex and weird disk layouts and geometries. It has excellent observability into the state of storage, and you should consider it a "must have" on all systems.
LVM's single limitation is you can not use it for /boot
or EFI System Partitions
. These
must remain as "true" partitions. If in doubt, trust your installer.
LVM combines a set of physical volumes (PV) into a volume group (VG). A volume group contains a pool of available storage. logical volumes (LV) can then be created within that volume group that consume that storage. Each logical volume can have it's own characteristics like raid levels. Logical volumes due to how they work may span multiple physical volumes since an LV consumes space from the VG which can allocate anywhere in the PV's available. This can be visualised as:
┌──────┐ ┌──────────────────────┐
│ LV │ │ LV │
└──────┘ └──────────────────────┘
┌──────────────────────────────────────────┐
│ VG │
└──────────────────────────────────────────┘
┌────────────┐ ┌────────────┐ ┌────────────┐
│ PV │ │ PV │ │ PV │
└────────────┘ └────────────┘ └────────────┘
Here we have two LV's within a VG. The VG is made from 3 PVs. The storage of the LV's may be anywhere within the pool of PV's that exist. This allows PVs to be added or removed dynamically as the LV content can be moved at any time.
General LVM Administration
Read The Man Pages
LVM has some of the best man pages ever put onto a Linux system. They are worth reading to understand the options you have for commands!
HINT: Some flavours of OpenSUSE may not have man pages. To fix this:
# $EDITOR /etc/zypp/zypp.conf
...
rpm.install.excludedocs = no
Ensure man is installed, and reinstall lvm2 to make sure it's man pages are present.
# zypper install -f man lvm2
OpenSUSE - Install LVM Tools
# zypper in lvm2
# reboot
NOTE: In some cases you may need to install kernel-default so that dm-raid's kernel module exists. This can be because kernel-default-base may lack the module.
Show all Physical Volumes
# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb vg00 lvm2 a-- 50.00g 50.00g
/dev/vdc vg00 lvm2 a-- 50.00g 50.00g
/dev/vdd vg00 lvm2 a-- 50.00g 50.00g
/dev/vde vg00 lvm2 a-- 50.00g 50.00g
Show all Volume Groups
# vgs
VG #PV #LV #SN Attr VSize VFree
vg00 4 0 0 wz--n- 199.98g 199.98g
Show all Logical Volumes
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_raid vg00 rwi-a-r--- 80.00g
Show the internal details of LVs
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_raid vg00 rwi-a-r--- 80.00g 18.80
[lv_raid_rimage_0] vg00 Iwi-aor--- 40.00g
[lv_raid_rimage_1] vg00 Iwi-aor--- 40.00g
[lv_raid_rimage_2] vg00 Iwi-aor--- 40.00g
[lv_raid_rmeta_0] vg00 ewi-aor--- 4.00m
[lv_raid_rmeta_1] vg00 ewi-aor--- 4.00m
[lv_raid_rmeta_2] vg00 ewi-aor--- 4.00m
Show the internal details of LVs and which devices are backing their storage.
# lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
lv_raid vg00 rwi-a-r--- 80.00g 21.35 lv_raid_rimage_0(0),lv_raid_rimage_1(0),lv_raid_rimage_2(0)
[lv_raid_rimage_0] vg00 Iwi-aor--- 40.00g /dev/vdb(1)
[lv_raid_rimage_1] vg00 Iwi-aor--- 40.00g /dev/vdc(1)
[lv_raid_rimage_2] vg00 Iwi-aor--- 40.00g /dev/vdd(1)
[lv_raid_rmeta_0] vg00 ewi-aor--- 4.00m /dev/vdb(0)
[lv_raid_rmeta_1] vg00 ewi-aor--- 4.00m /dev/vdc(0)
[lv_raid_rmeta_2] vg00 ewi-aor--- 4.00m /dev/vdd(0)
Using LVM For Raid
First you need to select your devices that will become the PV's
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 372K 0 rom
vda 254:0 0 10G 0 disk
├─vda1 254:1 0 2M 0 part
├─vda2 254:2 0 33M 0 part /boot/efi
└─vda3 254:3 0 10G 0 part /
vdb 254:16 0 50G 0 disk ----
vdc 254:32 0 50G 0 disk | <-- I will use these 4 devices.
vdd 254:48 0 50G 0 disk |
vde 254:64 0 50G 0 disk ----
Create PVs on each member device.
# pvcreate /dev/vdb
# pvcreate /dev/disk/by-path/virtio-pci-0000\:09\:00.0
...
Create a VG containing all the PVs
vgcreate <name of vg> <path to pv> [<path to pv> ...]
# vgcreate vg00 /dev/vdb /dev/vdc /dev/vdd /dev/vde
Create a new LV that is at your prefered raid level.
- If you have two PV's choose raid 1
- If you have three or more choose between raid 5 or raid 10
In each case, LVM will make sure that the data of the LV is correctly split to PV's to ensure redundancy.
- Raid 1 mirrors. It has no performance changes.
- Raid 5/6 have better write performance but lower read performance compared to raid 10.
- Raid 10 has better read performance but lower write performance compared to raid 5/6
lvcreate [options] -n <name of LV> --type <type> [-L|-l] <size of lv> --raidintegrity y <VG to create the LV in>
## Create an lv that consumes all space in the VG
## NOTE: you may need to reduce this from 100% with raidintegrity to allow LVM the space
## to create the LV.
# lvcreate -n lv_raid --type raid10 -l 100%FREE --raidintegrity y vg00
## Create an lv that provides 80G of raid storage, but consumers more of the VG underneath
# lvcreate -n lv_raid --type raid5 -L 80G --raidintegrity y vg00
HINT: The raidintegrity flag enables checksums of extents to allow detection of potential disk corruption
You can then use lvs
to show the state of the sync process of the LV.
Once created you can make a new filesystem on the volume.
mkfs.<fs name> /dev/<vg name>/<lv name>
# mkfs.xfs /dev/vg00/lv_raid
This can then be added to /etc/fstab
to mount on boot.
HINT:
/dev/<vg name>/<lv name>
paths will never change and can be used reliably in fstab
# $EDITOR /etc/fstab
...
/dev/vg00/lv_raid /mnt/raid xfs defaults 0 0
Managing LVM Raid
Replacing a Working Disk
If you want to expand your array, or just replace an old piece of disk media you can do this live.
Attach the new disk and locate it with lsblk
. Note it's name, path or other identifier.
Locate the disk you want to replace by it's path or identifier.
Create a new PV on the new disk.
# pvcreate /dev/vdf
Extend the VG with the new PV
# vgextend vg00 /dev/vdf
Check the pv was added to the vg
# pvs
Move the extents from the original device, to the new device.
## This blocks and monitors the move. If you ctrl-c it continues in the background.
# pvmove /dev/vde /dev/vdf
## Run the move in the background
# pvmove -b /dev/vde /dev/vdf
If backgrounded, you can monitor the move with:
# lvs -a
Replacing a Corrupted/Missing Disk
When checking lvs if a disk is corrupted or missing you will see errors such as:
# lvs
WARNING: Couldn't find device with uuid weDidc-rBve-EL25-NqTG-X2n6-fGmW-FGCs9l.
WARNING: VG vg00 is missing PV weDidc-rBve-EL25-NqTG-X2n6-fGmW-FGCs9l (last written to /dev/vdd).
WARNING: Couldn't find all devices for LV vg00/lv_raid_rimage_2 while checking used and assumed devices.
WARNING: Couldn't find all devices for LV vg00/lv_raid_rmeta_2 while checking used and assumed devices.
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_raid vg00 rwi-a-r-p- 99.98g 100.00
Create a new PV on a replacement disk, and add it to the vg.
# pvcreate /dev/path/to/disk
# vgextend vg00 /dev/path/to/disk
The logical volume must be active to initiate the replacement:
# lvchange -ay vg00/lv_raid
Replace the failed device allocating the needed extents.
## To any free device in the VG
# lvconvert --repair vg00/lv_raid
## To a specific PV
# lvconvert --repair vg00/lv_raid /dev/vde
You can view the progress of the repair with lvs
Instruct the vg to remove the missing device metadata now that the replacement is complete.
# vgreduce --removemissing vg00