Storage

TODO: Modify the partition table after writing the ISO image to the USB to include the other partitions. https://www.debian.org/releases/stable/amd64/ch04s03.en.html#usb-copy-isohybrid

Place any “elastic” partition at the beginning so that it can easily be expanded in the future if it is decided that the other partitions are no longer needed.

Portable

Wishlist:

  • Free space at the beginning to write temporary bootable Iso images.

    Reserve 8 GB at the beginning.

    GPT keeps a backup at the end of the drive. For this reason, prefer GPT. See Saving and restoring the GPT.

  • Big, portable storage partition.

    Reserve all space left over when other considerations have been taken care of.

    Some versions of some operating systems (e.g. Windows 10 before version 1703) only consider the first partition of removable USB mass storage devices. Make sure this is the first partition.

    This partition needs to be readable and writable by as many operating systems as possible while supporting large files. For this reason, prefer exFAT.

  • Bootable persistent operating system partition.

    Reserve 8 GB. In case of a drive with a total of 32 GB, this leaves 16 GB to the storage partition.

    This partition does not need to be readable and writable by other operating systems than the one residing on it.

WARNING: The commands described in this document can easily cause data loss if run without thinking.

To achieve this, run

lsblk
read dev
read name

sudo wipefs --all "/dev/${dev}"

sudo parted --script --align optimal "/dev/${dev}" -- \
    unit GB \
    mklabel gpt \
    mkpart primary 8 -8 \
    name 1 "$name-storage" \
    set 1 msftdata on \
    mkpart primary -8 -0 \
    name 2 "$name" \
    set 2 boot on \
    set 2 legacy_boot on \
    print free \
    quit

sudo mkfs.exfat "/dev/${dev}1"
sudo mkfs.ext4 -L "$name" "/dev/${dev}2"

Saving and restoring the GPT

lsblk
read dev
read name

To save the GPT to a backup file, run

sudo sgdisk --backup "$name.gpt" "/dev/${dev}"
sudo chown "$USER:" "$name.gpt"

To restore the GPT from a backup file, run

sudo sgdisk --load-backup "$name.gpt" "/dev/${dev}"

If no backup file is available, to restore the main GPT from the on disk backup GPT, run

sudo gdisk "/dev/${dev}" << EOF
r
b
w
y
EOF
sudo sgdisk --verify "/dev/${dev}"

References:

Drive health

NVMe

The NVMe specification says (page 184):

Percentage Used: Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

dev='/dev/nvme0n1'

# sudo nvme smart-log "$dev" \
sudo smartctl -a "$dev" \
| grep -i '\(critical.warning\|available.spare\|percentage.used\)\s*:'

sudo nvme error-log "$dev"

Secure erase

NVMe sector size

Some NVMe:s can be formatted with different sector sizes, which can have different performance characteristics. You can query these with

# dev=/dev/nvme0n1
read dev
sudo nvme id-ns "$dev" --human-readable | grep '^LBA Format *[0-9] *:'

The output will list the different supported sector formats, each of which include a “Relative Performance” metric (e.g. 0 Best, 1 Better, 2 Good) as well as an indication of which one is currently in use ((in use)).

To change the format, note the integer index after “LBA Format” and use it in the below command:

NOTE: THIS MAY DESTROY ALL DATA ON THE DEVICE!

read format
sudo nvme format "$dev" --lbaf="$format" --reset

References:

NVMe SMART error log

Apparently it is fairly common to get the following mail to root (/var/mail/root):

This message was generated by the smartd daemon running on:

   host name:  rcrnstn-laptop
   DNS domain: rcrnstn.net

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 233 to 234

Device info:
Micron 2200S NVMe 256GB, S/N:192823573D1D, FW:22001010

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Wed Oct 26 18:08:15 2022 CEST
Another message will be sent in 24 hours if the problem persists.

As the mail suggests, they are indeed in the system log:

journalctl | less +/'Device: .*, number of Error Log entries increased'

But it doesn’t give much more information. Likewise, I was unable to make smartctl print any more useful information:

sudo smartctl -l error /dev/nvme0

=== START OF SMART DATA SECTION ===
Error Information (NVMe Log 0x01, 16 of 256 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        408     0  0xb00a  0x8004  0x000            0     0     -

However, the following is insightful:

sudo nvme error-log /dev/nvme0 | less

Error Log Entries for device:nvme0 entries:64
.................
 Entry[ 0]
.................
error_count     : 408
sqid            : 0
cmdid           : 0xb00a
status_field    : 0x4002(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field)
phase_tag       : 0
parm_err_loc    : 0
lba             : 0
nsid            : 0
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................

That is, the NVMe received an unrecognized command from the kernel and logged that as an error. This is not a problem with physical storage.

This was reported to smartmontools

and fixed