home

personal data backup

Table of Contents

1. Intro

Creating backups is such a fundamental concept that it really needs no introduction. Things break and when they do you want to make sure you have copies of anything important you lost.

Over the years I've developed different strategies to handle backups. Some strategies mitigate the risk of system failure. Others handle to more common risk of user failures. This article covers each of them and how they came to be.

2. Portable Memory

TLDR: A physical copy of data covers most backup scenarios.

One of my first experiences with computing was with Arch Linux. Years ago I was gifted an old laptop that ran windows. It was awesome but incredibly slow. Opening a web browser could take up to a minute and boot times could get up to 5 minutes. An attempt to fix this eventually led to Arch Linux.

Arch Linux is a bare bones Linux distribution that comes with the bare minimum to start a machine. Everything else can be installed and customized by the user. This makes it a fun and rewarding experience to run but also a really easy system to break and that's exactly what I did, many times! Each time I broke it I would have to re install the system. I quickly learned the importance separating your data from your system and started storing a copy of my home directory on a USB drive.

Separating your data from your system allows you to easily backup and restore data. Just copy and paste. Now, I use a portable SSD to backup important data. For example, when I install a new OS I simply copy my latest backup.

3. Cloud Storage

TLDR: Copying data to the cloud allows you to access data from anywhere and it creates redundancy with your physical backups.

When I was a student my assignments needed to be printed out before they were turned in. This wasn't easy because I didn't own a printer. But my schools did. So I would copy my assignment to a USB drive and then use that at school. This worked well but the USB drives are pretty easy to loose. Luckily cloud based storage like Dropbox took off and I no longer needed to have physical access to my data.

Cloud based storage is a complementary strategy to backing up data on a physical drive with the added benefit of having access to your data from anywhere. Backing up to the cloud also serves as a backup in case the data on your physical drive becomes corrupt. Similarly, your physical drive acts a backup for your cloud backup in the case that your cloud provider has issues.

One thing to keep in mind is that putting your data on the cloud makes it accessible to anyone. This is not a good strategy for things like passwords.

4. Version Control

TLDR: Version Control provides a suite of backup solutions.

So far, we've covered backup strategies to handle system failures. But what about the more common case of user failures? How many times have you made a change that ctrl-z won't undo or accidental deleted a file with no way to recover it?

Most people are already familiar with version control so we won't spend any more time on it but it does deserve a mention. Version control lets you take snapshots of data sets (git commit). It also let's you create alternative datasets (git branch). Here's an example of using git to automate backups of org files.

5. Time based Snapshots

TLDR: Automated time based snapshots of your file system can prevent most user based data errors.

I used to work at a research institute that published studies on huge sets of data. To crunch the numbers they created a cluster of computers. The cluster as it was called had a shared file system that was used to storing data and various documents.

The infra team that ran this cluster did a fantastic job. They ran a system that would take time based snapshots of the file system. So, if you accidental deleted some data you could recover it by finding a hidden .snapshot directory. This directory would contain copies of the original directory at different points in time.

If you've ever used a time based snapshot system like this you know how great it is. If you haven't, image how nice it would be to know you can always restore a snapshot of your system. Accidentally delete a file that wasn't version controlled? Just look at the snapshot from an hour ago.

I've been happily using rnapshot.org to do this. It allows you to configure a backup strategy. Mine runs every hour. After 24 hours a daily snapshot is taken and hourly snapshots start to overwrite the oldest ones. After 7 daily snapshots are taken a weekly snapshot is taken and daily snapshots start to overwrite. And the same for monthly snapshots.

Date: 9.17.23

Author: Zach Dingels

license