Look at how distros do ZFS. sudo zpool history
on an installed system might be good.
rincebrain
rincebrain
Ubuntu 24.04 Desktop supports (optionally encrypted) ZFS on root. For server, see https://discourse.ubuntu.com/t/zfs-root-in-24-04/42274/9. It still uses zsys
with the bpool
/rpool
split. Jim Salters suggests doing a manual debootstrap
with ZFSBootMenu (which supports zpool
compatibility
=openzfs-2.1-linux
) instead of GRUB (compatibility=grub2
).
If we don’t go the ZFSBootMenu path, Dracut may be an improvement to Debian’s default initramfs-tools
. Presumably ZFSBootMenu can only boot straight to ZFS and can only deal with ZFS encryption. If we want stuff like a Yubikey to work, ZFSBootMenu might be out. ACTUALLY, ZFSBootMenu boots into a ZFS filesystem, it doesn’t need to be the final root, it can be an initrd
?
Actually, there is support for different initrd
s in zfs/contrib
:
zfs-initramfs
zfs-dracut
Installers (probably do not use directly, but look at the code)
Encryption (keys) (, and send
/recv
encrypted datasets/snapshots)
zfs-send
?zfs allow
https://github.com/openzfs/zfs/issues/12997#issuecomment-1248291866
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/index.html
Swap/hibernation
https://github.com/openzfs/openzfs-docs/issues/157#issuecomment-829841490
OpenZFS Field Notes by Rob Norris, fulltime ZFS contributor.
for package in zfs-dkms zfs-initramfs zfs-zed zfsutils-linux
do
echo "$package"
apt-file list -x "$package" \
| awk '{print $2}' \
| sed -n \
-e 's#^/usr/share/\(doc\)/[^/]*/\(.*README.*\)# \1 \2#p' \
-e 's#^/usr/share/\(man\)/[^/]*/\([^.]*\)\.\(.*\)\.gz# \1 \2(\3)#p' \
| sort
done
find '/usr/share/man' -name 'z*concepts*'
:
Concepts
zpoolconcepts(8)
zfsconcepts(8)
Options
zpool-features(5)
extensible_dataset
lz4_compress
zpoolprops(8)
zfsprops(8)
https://zewaren.net/freebsd-zfs-encryption-with-open-zfs.html
TODO: GRUB lacks support for many desirable ZFS features, which more or less forces you to use a separate boot pool and makes things complicated. zfsbootmenu
(which is also a full fledged (UEFI only) boot manager/loader) does not require this and may be a very nice alternative. This also allows us to easily boot into many separate operating systems (distros) without the prohibitively expensive (logistics/admin and to some degree storage) requirement of several boot pools (i.e. several partitions, which have to be carved out ahead of time).
ZFSBootMenu had a guide for Debian Bookworm (the latest one on openzfs-docs is only Bullseye as of the time of writing): https://docs.zfsbootmenu.org/en/latest/guides/debian/bookworm-uefi.html
The guide is a little unclear (to the uninitiated) about where the encryption passphrase file is stored (is it on the unencrypted EFI partition!? no). See https://old.reddit.com/r/zfs/comments/lkd10u/still_confused_as_to_how_zfsbootmenu_handles/ for a clarification.
The guide suggests dmesg | grep -i efivars
to detect EFI support but test -d /sys/firmware/efi
is less brittle (the dmesg ringbuffer might have paged out the boot string, or some other unrelated log line might contain the grepped string.)
The guide mentions a hostid
, which to me was unfamiliar. As far as I can tell, this is used by ZFS to mark which system currently “owns” a pool. This mark is cleared on pool export. gethostid
is standardized by POSIX. ZFS uses it through the Solaris Porting Layer (which is now a part of OpenZFS). It is documented in the spl-module-parameters
man page.
The ZFSBootMenu testing scripts could be a good reference when setting up a system: https://github.com/zbm-dev/zfsbootmenu/tree/master/testing, in particular the helpers
directory.
TODO: It would probably be good to write a guide starting from the beginning:
The below assumes GRUB:
When whole disks are given to
zpool
it automatically partitions them to allow some slack when replacing disks with slightly smaller disks.
https://wiki.archlinux.org/title/ZFS#GRUB-compatible_pool_creation
Note that you cannot e.g. export and import a pool which contains the dataset that is mounted on /
(or other important locations). If there is a pool that is expected to receive a lot of administrative action, maybe putting the root dataset on it is not a good idea.
Sector size:
zpool create -o ashift=12
TLER:
Cache:
l2arc_write_max
?Note that a dataset is created automatically when a pool is created.
Encryption
encryption=off|on|aes-128-ccm|aes-192-ccm|aes-256-ccm|aes-128-gcm|aes-192-gcm|aes-256-gcm
Cannot be changed after dataset creation.keyformat=raw|hex|passphrase
Must be provided when encryption
is set. Can be changed later with zfs-change-key
keylocation=prompt|file://</absolute/file/path>
Defaults to prompt
. Can be changed later with zfs-change-key
pbkdf2iters=iterations
encryptionroot
keystatus
openzfs
pull requests
openzfs
issues
April 2022 OpenZFS Leadership Meeting:
encryptionroot
only refers to the wrapping key.
zfs-change-key(1)
:
If the user’s key is compromised, zfs change-key does not necessarily protect existing or newly-written data from attack. Newly-written data will continue to be encrypted with the same master key as the existing data. The master key is compromised if an attacker obtains a user key and the corresponding wrapped master key. Currently, zfs change-key does not overwrite the previous wrapped master key on disk, so it is accessible via forensic analysis for an indeterminate length of time.
In the event of a master key compromise, ideally the drives should be securely erased to remove all the old data (which is readable using the compromised master key), a new pool created, and the data copied back. This can be approximated in place by creating new datasets, copying the data (e.g. using zfs send | zfs recv), and then clearing the free space with zpool trim –secure if supported by your hardware, otherwise zpool initialize.
zpool-trim(1)
:
-d –secure Causes a secure TRIM to be initiated. When performing a secure TRIM, the device guarantees that data stored on the trimmed blocks has been erased. This requires support from the device and is not supported by all SSDs.
Kind of the point of encryption is to protect you if they have [physical] access to your disk.
Let’s not give the users the false idea that changing your passphrase actually does anything. Changing your passphrase is for “I don’t like typing that old thing anymore, I’m going to type some new thing”, not for “somebody knows my old passphrase, let me change it to one that people don’t know”.
Someone else coming to the same conclusion: https://old.reddit.com/r/zfs/comments/wk4t14/safely_remove_old_encryption_keys_and_some_other/
Compression
zpool create -o
compression=on
/ compression=on
(look at lz4_compress
pool feature)Maybe worth looking at zstd
for slow pools of spinning disks on file servers? https://github.com/openzfs/zfs/pull/10278 https://github.com/openzfs/zfs/commit/10b3c7f5e424f54b3ba82dbf1600d866e64ec0a0 https://news.ycombinator.com/item?id=23210491 lz4
implements an early abort which is currently not implemented for zstd
, so there might be a higher than expected performance difference for uncompressible data. This is only relevant during write though, so if they occur infrequently enough the difference may not matter. Also, if the data is compressed with lz4
specifically, it will stay compressed in the ARC (decompression is so fast that this isn’t in RAM), effectively giving you a bigger ARC which is very nice. Standardizing on a compression algorithm also avoids (is this implemented?) recompressing compressed send/receive.
Auto-expand
zpool create -o autoexpand=on
Note that default filesystem properties can be set when creating a pool with -O
(pool properties are supplied with -o
).
TRIM
autotrim=on
pool option (device support is detected automatically)l2arc_trim_ahead
kernel module optionAccess time
zfs create -o atime=off
(is relatime=on
better?)Other filesystem options
mountpoint=none
canmount=off
devices=off
acltype=posix
xattr=sa
(recommended when acltype=posix
)dnodesize=legacy
(default on OpenZFS 2.0.6)normalization=formD
quota=10GB
, reservation=1GB
Names
# set -- raidz2 sda sdb
codename="$(lsb_release -sc)"
pool='zpool'
options_pool='ashift=12 autotrim=on'
# TODO: Some of these are default?
options_dataset='compression=on acltype=posix xattr=sa relatime=on normalization=formD dnodesize=auto sync=disabled'
zpool create "$pool" \
$(echo "$options_pool" | xargs -n1 printf '-o %s ') \
$(echo "$options_dataset" | xargs -n1 printf '-O %s ') \
"$@"
zpool export "$pool"
zpool import -d '/dev/disk/by-id/' "$pool"
Special datasets:
/home/
/var/log/
/var/cache/
zfs create -o secondarycache=metadata -o recordsize=1m
Permissions:
Look at zfs allow
-d
/-l
and -c
flags.
sudo zfs allow -u "$USER" snapshot,send "$HOME"
properties="compression"
sudo zfs allow -u "$USER" create,receive "$HOME"
sudo zfs allow -u "$USER" "$properties" "$HOME"
Hibernation
libpam-zfs
?
sanoid
(includes syncoid
))
zsnapd
, zsnapd-rcmd
)simplesnap
)zfs-auto-snapshot
)zfsnap
)People often like to use a third party tool to manage their snapshots, some sort of automated thing. Do you have favorite such tool?
I don’t have any favorite snapshot or replication management tool.
– Matt Ahrens (Founding member of OpenZFS, one of the main architects of ZFS), 2018-04-04. Note that he doesn’t question the need for such a tool (and indeed their Delphix software may be one such tool).
Amazon Glacier?
https://jrs-s.net/2016/09/15/zfs-snapshots-and-cold-storage/
This looks like a good reference: https://www.rodsbooks.com/efi-bootloaders/index.html.
${esp:-/boot/efi}/EFI/BOOT/BOOTX64.EFI
is used for “ad-hoc” booting (e.g. from removable storage).
UEFI (Unified Extensible Firmware Interface)
The UEFI firmware decides which boot manager on the ESP (EFI system partition) to load
UEFI variables are stored in firmware NVRAM and can be accessed from an operating system with the following programs:
efibootmgr
efivar --list
bcdedit /enum FIRMWARE
(bcd
stands for “Boot Configuration Data”)distro="$( lsb_release --short --id)"
codename="$(lsb_release --short --codename)"
case "$distro" in
Debian)
# https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/index.html
sudo sh -c "cat > /etc/apt/sources.list.d/$codename-backports.list" << EOF
deb http://deb.debian.org/debian $codename-backports main contrib
EOF
sudo sh -c "cat > /etc/apt/preferences.d/90_zfs" << EOF
Package: src:zfs-linux
Pin: release n=$codename-backports
Pin-Priority: 990
EOF
sudo apt update
sudo apt install dpkg-dev linux-headers-generic linux-image-generic
sudo apt install zfs-dkms zfsutils-linux
;;
Ubuntu)
# https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/index.html
sudo sh -c "cat > /etc/apt/sources.list.d/$codename-universe.list" << EOF
deb http://archive.ubuntu.com/ubuntu $codename main universe
EOF
sudo apt-get update
sudo apt-get install zfsutils-linux
;;
esac
sudo apt install zfs-zed
sudo modprobe zfs
sudo systemctl start zfs-zed.service
# TODO: Check systemd targets at <https://www.youtube.com/watch?v=ELdvHS3jtoY&t=6m48s>.
# TODO: Also install `samba`?
# sudo zfs set sharesmb=on "$pool/$dataset"
# sudo smbpasswd -a "$USER"
TODO: Look at zsys-setup
for inspiration.
We want:
fat
ext4
or zfs-member
(bpool
) with some features turned offzfs-member
(rpool
)# https://www.youtube.com/watch?v=7F7Ch-ZkiQU
# https://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs
EFI_SYSTEM_PARTITION='C12A7328-F81F-11D2-BA4B-00A0C93EC93B'
LINUX_BOOT='BC13C2FF-59E6-4262-A352-B275FD6F7172'
LINUX_ROOT_X86_64='4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709'
LINUX_FILESYSTEM_DATA='0FC63DAF-8483-4772-8E79-3D69D8477DE4'
SOLARIS_BOOT='6A82CB45-1DD2-11B2-99A6-080020736631'
SOLARIS_ROOT='6A85CF4D-1DD2-11B2-99A6-080020736631'
# TODO: Set the types for all partitions (`fdisk` can list the types with `L`
# at the `t` prompt).
sudo fdisk "$dev" << EOF
g
n
+512M
t
1
n
+1G
n
+2G
EOF
mkfs.fat -F32 "{$dev}1"
zfsutils-linux
zfs-zed
zfs-initramfs
zsys
ubiquity
cryptsetup-initramfs
/usr/share/doc/cryptsetup-initramfs/README.initramfs.gz
]Persistent naming:
SMART
Installation
Topology:
dataset
(filesystem, can be mounted), zvol
(device, can contain other filesystem), multiple of which live in a
zpool
, which is a collection of
vdev
, which is any of
Maintenance considerations:
zpool
s are only as redundant as their least redundant vdev
.vdev
s can never be removed from a zpool
.vdev
s can (only) be grown by replacing all the drives in it, one at a time.vdev
RAIDZ cannot grow in number of drives (unlike traditional md
RAID)./dev/disk/by-id/
names when adding drives.RAIDZ best practice:
lsblk -p -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT
Layout conventions (as used in e.g. Ubuntu 20.04 LTS):
id="$(lsb_release --id --short | tr '[:upper:]' '[:lower:]')"
or id="$(. '/etc/os-release'; echo "$ID")"
bpool
BOOT
${id}_${rand1}
: /boot
rpool
ROOT
${id}_${rand1}
: /
$dir
: /$dir
(there are quite a few, nested)USERDATA
root_${rand2}
: /root
${user}_${rand2}
: /home/${user}
/etc/fstab
:
UUID=...
(EFI partition) -> /boot/efi
(vfat
)/boot/efi/grub
-> /boot/grub
(bind
)/dev/mapper/cryptoswap
-> none
(swap
)/etc/crypttab
:
/dev/nvme0n1p2
-> cryptoswap
(/dev/urandom
, swap,initramfs
)??
/dev/zvol/rpool/keystore
-> /dev/zd0
-> /dev/mapper/keystore-rpool
-> /run/keystore/rpool
References:
didrocks
)
zfsutils-linux
zfs-initramfs
zfs-auto-snapshot
(note that this use case is also covered by zsys
)/usr/share/ubiquity/zsys-setup
, ubiquity/scripts/zsys-setup
phoronix.com
zsys-setup
.hibernate*memory+2**round(log2(sqrt(memory)))
hibernate*memory+round(sqrt(memory))
8*(hibernate+hungry_fs)
send
/recv
(on storage servers the recommendation of ZFS is all but unanimous, and you do backup, right?)zsys
TODO: Move stuff here from other places in this document?
TODO: Make zsys-layout
more robust, test it, and create a repo for it:
#!/bin/sh
set -euC
# https://web.archive.org/web/20200922112105/https://didrocks.fr/2020/06/16/zfs-focus-on-ubuntu-20.04-lts-zsys-dataset-layout/#why-so-many-system-datasets
host_id="$(TODO)"
datasets="$(printf "%s\n" \
"srv" \
"usr/local" \
"usr/lib" \
"usr/games" \
"usr/log" \
"usr/mail" \
"usr/snap" \
"usr/spool" \
"usr/www" \
)"
rename() {
for dataset in "$datasets"
do
zfs rename "$1/$dataset" "$2/$dataset"
done
}
case "$action" in
'server') rename "rpool/ROOT/$host_id" "rpool" ;;
'desktop') rename "rpool" "rpool/ROOT/$host_id" ;;
esac
zfs-send -w
rawzfs-send -i
incrementalzfs-send -n
dry-runzed
(ZFS Event Daemon)zrepl
httm
You are the “family admin” tasked (by yourself, let’s face it) to set up a storage solution. You cross-backup to/from several locations that are not adversarial but (technically) untrusted.
There are several ways in which the disks can leave your control:
Controlled.
Decommissioning. The drives have reached end of life and you want to get rid of them in a secure manner. This should include selling to recuperate the cost of upgraded storage. You are not prepared to trust simply overwriting the bulk storage.
Mitigation: Encryption. Keys stored on media other than the bulk storage which is trusted to do secure erase, or is cheap enough that physical destruction is acceptable (can be repurposed for the same task for the new storage until one of these two options is ultimately carried out).
Uncontrolled.
Slipping through the cracks. Non-technical people tend to associate physical drives entirely with data availability and forgetting about data confidentiality. Once the data is copied to some upgraded storage, the old drives are simply (insecurely) discarded. There are plenty of ways in a family setting that drives slip out of the hands of the security-conscious admin (“I thought I didn’t need them anymore, so I sold them”).
Theft.
Mitigation: Passphrases on keys. Preferably strong, even more preferably hardware-based (on hardware unlikely to be lost together with hardware holding data). User ergonomics matter here.
Destruction.
Natural disasters or other events that leave data/key/passphrase-holding hardware unrecoverable.
Mitigation: Off-site backups. Including intermediary keys and passphrases (and their key derivation parameters). To offline media such as paper or USB-sticks. If hardware is used for passphrases, these can additionally be backed up to identical hardware for faster recovery.
See also the Linux fscrypt
thread model.