Preventing accidental data loss with ZFS

I have recently converted the file system of some of our Samba servers from ext4 to ZFS. The idea was to use ZFS’s compression feature to reduce the space some excessingly large files consume which can be easily compressed to take less than 1/10 the space. This works fine.

Today we found that a directory with data from yesterday was missing. Unfortunately the last backup from yesterday evening also did not contain it. The mid day backup from yesterday contained it, but of course that data was half a day old so we lost about 3 hours of work (luckily only of a single person).

So, I came back to two other features ZFS offers:

  • Snapshots
  • Clones

There is a small Ubuntu tutorial on these, but I found it a bit confusing, so I am putting some notes here on the topic myself.

Snapshots and clones are related: You take snapshots of a dataset and in order to access these, you create a clone based on a snapshot.

The process is very simple. To create a snapshot you use the zfs snapshot command:

zfs snapshot thepool/thedataset@thesnapshot

where thepool is a ZFS pool, thedataset is a dateset in that pool and thesnapshot is the name of the snapshot. That name could e.g. be the date and time when you create the snapshot.

Creating a snapshot basically takes no time, even with large datasets. They also use near to zero space until you make changes to the original dataset. And you can have many of them. You could automatically create a new snapshot every minute, but that might be a bit excessive for most setups.

Once a snapshot has been created, it preserves the file system in the state it had when that snapshot was created.

To access the data of a snapshot, you create a clone of that snapshot with the zfs clone command:

zfs clone thepool/thedataset@thesnapshot thepool/theclone

where thepool is a ZFS pool, thedataset is a dataset in that pool, thesnapshot is the name of the snapshot and theclone is the name of the clone.

After that the content of the snapshot is automatically mounted as

/zfs/thepool/theclone

While a snapshot does never change, a clone can be changed. Any change you make to a clone will of course take up some space.

So, how does that help with the scenario above?

Simple: Since snapshots are cheap, we can create them regularly, e.g. every 10 minutes or so. We keep them for a reasonable time (e.g. a week?) and then destroy them to release the space they have taken up.

Some additional commands related to snapshots:

To get a list of all existing snapshots, use the zfs list command:

zfs list -t snapshot

To rename a snapshot, use the zfs rename command:

zfs rename thepool/thedataset@thesnapshot thepool/thedataset@thenewsnapshot

To delete a snapshot, use the zfs destroy command:

zfs destroy thepool/thedataset@thesnapshot

You can not delete a snapshot from which you have made a clone. You will have to delete that clone first.

To roll back a dataset to a snaphot, use the zfs rollback command:

zfs rollback thepool/thedataset@thesnapshot

DANGER: You will lose all the changes made to the dataset after that snapshot was taken!

Some additional commands related to clones:

To delete a clone, use the zfs destroy command:

zfs destroy thepool/theclone

This will only delete the clone, not the snapshot it was based on. But you will lose any changes you made to the clone.

It is also possible to replace a dataset with a clone, but I will not explain this here because I am still not quite sure how that works. If I ever need it, I will have to do some more research.