ZFS alert to avoid nasty surprises

Published by Oliver on

ZFS is a great filesystem for your server. Until something goes wrong. I wrote a small ZFS alert script that sends me a popup notification if my ZFS pool ever goes offline.

The ZFS filesystem

ZFS is a very robust and versatile filesytem that includes many features like snapshots, RAID and deduplication. I have used it to build my own home file and application server. ZFS storage is based on so called pools: those are combinations of HDDs (or SSDs or any other storage media) into one big “pool” of storage. Depending on your setup the pool will distribute the data between the disks in the pool.

While ZFS has some build-in tools to rectify any errors it is still very important to know about them, for example to provide a replacement disk. The status of the pool(s) can be checked by using the zpool status command.

zpool status output for a raidZ1 pool

Scrubbing the pool

The first important task any ZFS pool owner has is scrubing. No you do not need to go outside and start cleaning your pool 😉 Any data(block) in a ZFS filesystem has a checksum. Faulty hardware of even cosmic radiance (no joke!) can cause errors in your data over time. A ZFS scrub will go over all that data and find any of those errors (to a certain degree) by checking the data against the checksum. Any error will then be fixed automatically.

This process is a very important one that should run regularly, otherwise you will risk data loss like a certain Youtuber. If you install open ZFS on a Ubuntu based system it automatically creates a cronjob that is running a scrub once a month (in my case in /etc/cron.d/zfsutils-linux). If you are looking to do that manually you can always run zpool scrub poolName.

Building a ZFS alert – a healthcheck with popup messages

Even if you regularly run a ZFS scrub there are still things that could go wrong. Even though I do have backups for my server, and even build an alerting system for those backups I still wanted to make sure that my ZFS pool is always in a good condition. If it ever is not I want to know it as quickly as possible.

I am already using a solution for alerts directly to my smartphone called Pushover. I have used it in the past for smart home alerts from Home Assistant, for the backup alerts and for notifications from my 3D printer. Now to the monitoring of the pool health. There is a simple switch for the status command to check the pool health.#

zpool status -x // checking all pools
all pools are healthy

zpool status dataPool -x // checking only the pool called dataPool
pool 'dataPool' is healthy

Using this command with the precise output it is simple to write a script in combination with the Pushover API. My attempt looks like this.

#!/bin/bash

# pool to check
POOL="dataPool"

# needed paths
LOGFILE="/var/log/poolStatusCheck.log"
ZFS="/sbin/zfs"

# pushover data
PO_TOKEN="pushOverTokenFromYourAccount"
PO_UK="pushOverUserKeyFromYourAccount"

# -------------- program, don't change ---------------

if [ "$(zpool status $POOL -x)" != "pool '$POOL' is healthy" ]; then
        echo "$(date) - Alarm - zPool $POOL is not healthy" >> $LOGFILE
        curl -s -F "token=$PO_TOKEN" \
    -F "user=$PO_UK" \
    -F "title=Zpool status unhealthy!" \
    -F "message=The status of pool $POOL does not seem to healthy" https://api.pushover.net/1/messages.json
else
        echo "$(date) - Zpool $POOL status is healthy" >> $LOGFILE
fi

I also uploaded the full script to my ZFS home server repository and will keep it up to date there. To use this script you have to provide some data by adapting the values of the variables at the top. First change the name of the pool to the one you want to check the status of. The paths for the log file and the ZFS bin can stay unchanged in most cases. If you want to send a pushover message you will also need to provide a Pushover toke and userkey. The process to get those is described in the official documentation.

Now you can check this script by directly running it. If all works well you can also add it as a cron job to run it regularly.

cp zpoolStatusCheck.sh /usr/local/bin/zpoolStatusCheck.sh # copy the script
sudo crontab -e # go to the cron editor
# and add a cron job that runs the script every 5 minutes
*/5 * * * * bash /usr/local/bin/zpoolStatusCheck.sh

That’s it. A simple script that might just safe the precious data on your server. If you prefer other ways of notifications then you should know that ZFS has a built-in notification tool (search for zed daemon). Pushover support was even added recently but it will take quite some time until those versions are supported by all operating systems. Until then a small script like this can help a lot.

Categories: Software