Peer-to-peer backups

How to set up encrypted, de-duplicated backups with Duplicacy and a Raspberry Pi.

Introduction

Local backups are good. Off-site backups are better. Local backups plus off-site backups are best.

This article describes how to set up automatic off-site backups in a secure fashion. Some benefits of frequent local backups are quick access to older versions of a file and protection against loss of data after a disk failure. In addition, off-site backups add protection against theft, fire, and ransomware. For a lifetime of work, family photos, and personal archives, anything less is … irresponsible at best.

Automated local backups are easily set up with built-in tools such as Time Machine in macOS, or similar ones for Windows and Linux.

Off-site backups are slightly more involved. One option is to put your files at the mercy of a cloud service, such as Dropbox, iCloud, or similar. Another option is to sign up with a remote backup service such as CrashPlan or BackBlaze. These all charge a monthly fee, and you end up with your private backups on some server in some country - although presumably still on this planet.

But what if you don’t want that? What if you want a system which backs up to your own server at a friend’s place (and perhaps reciprocate with a similar setup at your place for him/her to use)?

Pre-Requisites

The solution described here is based on the following seven ingredients:

  • a Raspberry Pi - any model or any other similar Linux setup will do
  • a USB hard disk - large enough to store all backups, including old versions
  • an SD card - 1 GB is plenty, since it will only be used during initial boot
  • Duplicacy is the software used to make it all happen
  • DietPi will be used as trivial-to-manage Linux distribution
  • SSH will be used for all transfers, in the form of file-only SFTP requests
  • a reasonably fast internet connection - 30 Mbit/sec will be fine

(on the left: USB power, on the right: USB drive and LAN cables, underneath: 2.5” USB HD)

In this example, a very old Raspberry Pi B v1 has been used, with only 256 KB of RAM. This is more than enough, but a dual- or quad-core model would probably be a bit faster - with this single-core board, the average backup speed is only 3 MByte/sec (and definitely CPU-bound).

Choose the hard disk as large as possible to allow for future storage needs - the largest available 2.5” drives were 5 TB at the time of this writing. Multiple drives could probably be combined using LVM.

The SD card will be used to install DietPi, but after that the system will be transferred to the USB HD, with the SD card only used while booting up.

The internet connection speed limits the rate at which data can be transferred during backups and restores. Given that all backups after the first one are incremental, top speed is not that critical. And the large initial backups can be made while the Raspberry Pi is still connected to your local network.

Duplicacy

There are numerous backup tools. Duplicacy is particularly interesting due to its design choices. It offers open-source command-line builds, written in Go, as well as (proprietary) GUI wrappers for Mac and Windows (these require a small fee and phone home to check their license status).

Duplicacy is client-side only. The server can be anything that supports a file system, from local disks to file servers, as well as a large variety of cloud services.

The backups can (and should) be encrypted before being sent out, using a password which never needs to leave the client site(s) - just be sure to have it in reach after a complete disk loss.

But the main attraction is the built-in deduplication feature, which means that even if you have copies of files, only one copy needs to be kept in the backup. This is not merely a convenience for duplicate files, it also saves a lot of disk space when you move files around (or even rename them): the file will not be backed up again, only the index will change - in contrast to Time Machine, for example.

With deduplication, the key is to properly clean up (“prune”) data which is no longer in any snapshot. Duplicacy deals with this “fossil collection” in a very elegant way, using a lock-free approach, so that multiple clients sharing the same storage do not cause trouble or lead to massive locking slow-downs.

Lastly, Duplicacy stores data in variable-sized chunks (1..16 MB by default), so that lots of small files, as well as huge ones, are always saved in manageable fragments. As with rsync, successive backups of ever-growing log files only need to send the changed tail end of the file.

The Big Picture

The point of this exercise is to end up with a self-contained setup which requires virtually no maintenance, has lots of storage, and acts as storage backend for any number of backup “clients”.

To get there, the following steps need to be taken:

  1. installing DietPi on the Raspberry Pi
  2. adding the USB disk and transferring the system to it
  3. setting up a new user with (only) secure SFTP access
  4. performing the first (large) backups while still inside the LAN
  5. moving the setup off-site and adjusting access settings

After that, it’s simply a matter of keeping all the Duplicacy clients running, with proper settings for backup frequency and snapshot cleanup. With occasional restores whenever needed …

The remainder of this (long) story will be about going through each of the steps outlined above.

DietPi on Raspberry Pi

Download the DietPi image for your board and save it to your SD card, as described in Step 2.

Also see that page for details if you intend to use WiFi instead of wired LAN.

Important! - in DietPi 6.9, there was a problem with automatic swap file settings on SD cards smaller than 4 GB - the easiest workaround is to make one change to the file dietpi.txt in the “boot” partition on the SD card before inserting it into the Raspberry Pi:

  • change AUTO_SETUP_SWAPFILE_SIZE from 1 to 100 (MB), or even to 0 to turn it off

Now, power up the Raspberry Pi with the SD card and network plugged in, but leave the USB hard disk off for now. The first goal is simply to get DietPi working off the SD card.

DietPi can be configured entirely via the network. It does not need a monitor, keyboard, or mouse, since this setup is going to be used as a “headless” server.

Find the IP address of your setup and use SSH to login - the username is “root”, password “dietpi”.

The first time DietPi boots, it will resize the disk and reboot itself, and then it’ll go through a number of steps to set things up. All prompts should be self-explanatory. Always login as “root” at this stage.

You may also be taken through a number of updates of DietPi itself, as it brings itself in line with the latest and greatest release version (6.10 at the time of this writing).

Once all the preliminaries are out of the way, you’ll end up in the Dietpi-Software screen:

┌──────────────────────────┤ DietPi-Software ├───────────────────────────┐
│                                                                        │
│ Help!                            Links to online guides, docs and   ↑  │
│ DietPi-Config                    Feature-rich configuration tool f  ▒  │
│                                  ─── Select Software ─────────────  ▒  │
│ Search                           Find a software title for install  ▒  │
│ Software Optimized               Select DietPi optimized software   ▒  │
│ Software Additional              Select additional Linux software   ▒  │
│ SSH Server                       : Dropbear                         ▒  │
│ File Server                      : None                             ▒  │
│ Log System                       : DietPi-Ramlog #1                 ▒  │
│ Webserver Preference             : Lighttpd                         ▒  │
│ User Data Location               : SD/EMMC | /mnt/dietpi_userdata   ▒  │
│                                  ─── Install or Remove Software ──  ▒  │
│ Uninstall                        Select installed software for rem  ▮  │
│ Install                          Go >> Start installation for sele  ↓  │
│                                                                        │
│                   <Ok>                       <Exit>                    │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

This is one of a couple of central configuration utilities of DietPi. They are all text-based, and can be launched from the command line (as root) at any time, from anywhere.

You need to make a few small but essential changes:

  • in DietPi-Config => Display Options => Change Resolution => select “Headless” (last one)

  • go back a bit, then in Language/Regional Options => adjust as needed

  • go back a bit, then in Security Options => adjust as needed

At this point, you’re advised to reboot. Do so and log back in.

  • in DietPi-Software => SSH Server => select “OpenSSH” - this is essential for SFTP use!

  • go back a bit, then in Log System => select “DietPi-Ramlog #2”

  • go back, then select Install to complete the first-time installation

That “Install” is a crucial final step, as it switches DietPi into normal-use mode once done:

┌──────────────────────────┤ DietPi-Software ├───────────────────────────┐
│                                                                        │
│ DietPi is now ready to install your software choices:                  │
│  - DietPi-Ramlog: minimal, optimized logging                           │
│                                                                        │
│ Software details, usernames, passwords etc:                            │
│  - https://dietpi.com/software                                         │
│                                                                        │
│ NB: Software services will be temporarily controlled (stopped) by      │
│ DietPi during this process. Please inform connected users, before      │
│ continuing. SSH is not affected.                                       │
│                                                                        │
│ Would you like to begin?                                               │
│                                                                        │
│                   <Ok>                       <Cancel>                  │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

You will again be asked to reboot. Note that the SSH’s “Host Key” will have changed due to the switch from DropBear to OpenSSH. Your SSH client should generate an error for this. The way to fix it, is to edit your $HOME/.ssh/known_hosts file and remove the offending line (mentioned in the error).

That’s it. You have a working DietPi setup, ready to be taken to the next level.

Switching to the USB disk

At this point, a small Debian-based system is running off the SD card. You now need to switch it over to run from a USB disk drive and create a dedicated secure-and-restricted login for Duplicacy clients.

This is a tricky operation, but DietPi has it all covered in its menu system - albeit a bit convoluted:

  • power down with shutdown -h now, connect the USB disk, power back up, and log in as user “dietpi” (password “dietpi”, unless you changed it in DietPi-Config, which is highly recommended)

  • sudo dietpi-software => User Data Location => Drive - select your USB disk (it will be called something like /mnt/36BED0B0BED069BF with device name /dev/sdaX)

  • if there is an entry Unmount, then you need to unmount it first

  • then select Transfer RootFS to start the switch-over to the USB disk

As part of the transfer, the disk will be reformatted. Make sure the Partition Type is GPT and the Filesystem Type is EXT4. This is a destructive operation, as DietPi wil tell you:

┌────────────────────────┤ DietPi-Drive_Manager ├────────────────────────┐
│                                                                        │
│ This process will move RootFS data to another location. This may       │
│ increase filesystem performance when using a USB drive over SD card,   │
│ however, there are some limitations:                                   │
│                                                                        │
│  - The SD/EMMC card is still required for the boot process             │
│  - ALL data on the target PARTITION will be deleted                    │
│                                                                        │
│ NB: As this feature is still in testing, we recommend you use this     │
│ feature on a fresh installation only.                                  │
│                                                                        │
│ Do you wish to continue?                                               │
│                                                                        │
│                   <Ok>                       <Cancel>                  │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Once it reboots, the new disk will be the root disk, as you can see with the “df -H .” command:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.0T  614M  5.0T   1% /

Secure SFTP Access

Duplicacy supports SFTP as storage backend, which is essentially FTP via SSH, i.e. secure password-less logins and encrypted file transfers.

It’s easy to set up, but the trick is to tighten the noose a bit, to limit access to just sending and fetching files (and listing/renaming/deleting them). With a bit of preparation, a dedicated user login can be created, which only allows SFTP access to its home directory (using chroot) and which disables shell access. In the worst case, a compromised Duplicacy client could mess with the (encrypted) backups and even delete them, but it cannot escalate and compromise the storage backend server.

Then again, if there’s a security breach on one of the client machines, all its data is already up for grabs anyway - there’s not much point going to its backup server for that.

Note that when multiple clients share the same storage backend (which is great for deduplicating between them), then access to the server provides access to the data of all clients. If this is a concern, different backup areas can be used - i.e. still the same login but with different storage encryption keys. For maximum separation, different logins will completely “sandbox” all backup accesses and prevent any breach from carrying over to any other backup clients.

Here is how to set up a new “save” user, with home directory /home/save/ and in it a backups/ area which will be used to store all backup data files. The following commands require root access (use sudo -i if you logged in via dietpi):

# useradd -m -s /bin/false save
# cd /home/save/
# ls -la
total 20
drwxr-xr-x 2 save save 4096 Jun 28 13:54 .
drwxr-xr-x 4 root root 4096 Jun 28 13:54 ..
-rw-r--r-- 1 save save  220 May 15  2017 .bash_logout
-rw-r--r-- 1 save save 3523 Mar 13 21:55 .bashrc
-rw-r--r-- 1 save save  675 May 15  2017 .profile
# rm .bash* .profile
# chown root:root .
# mkdir .ssh backups
# chown save:save .ssh backups
# ls -la
total 16
drwxr-xr-x 3 root root 4096 Jun 28 13:56 .
drwxr-xr-x 4 root root 4096 Jun 28 13:54 ..
drwxr-xr-x 2 save save 4096 Jun 28 13:55 .ssh
drwxr-xr-x 2 save save 4096 Jun 28 13:55 backups
#

This creates a new user account, but it can’t do anything at the moment, neither SFTP nor log in.

First of all, you need to set up password-less access via SSH. This is done via private/public RSA key pairs - as nicely documented on this DigitalOcean page. Just be sure to leave the passphrase empty.

The private key will usually never need to leave the machine on which it was generated. The public key ($HOME/.ssh/id_rsa.pub) is the one you can copy around to allow SSH logins. This is done by adding the public key to a file called $HOME/.ssh/authorized_keys - on the SSH host side:

# cat >>.ssh/authorized_keys
ssh-rsa AAAAB3NzPTAH[etc...]
# chown -R save:save .ssh/
#

The second change is to adjust SSH to grant SFTP access to user “save” (and disallow anything else). Enter nano /etc/ssh/sshd_config and change the file so the last lines looks exactly like this:

# Allow client to pass locale environment variables
AcceptEnv LANG LC_*

# Override default of no subsystems
Subsystem sftp internal-sftp

# Disallow root login over SSH
PermitRootLogin no

Match User save
    ChrootDirectory %h
    ForceCommand internal-sftp
    AllowTcpForwarding no

To let these changes take effect immediately, type /etc/init.d/ssh reload.

Let’s check that everything works as expected by trying out some commands on the client machine:

$ ssh save@blah
This service allows sftp connections only.
Connection to blah closed.
$

Aha - shell access is disallowed. Good. Now let’s try SFTP:

$ sftp save@blah
Connected to blah.
sftp> ls -la
drwxr-xr-x    4 0        0            4096 Jun 28 13:04 .
drwxr-xr-x    4 0        0            4096 Jun 28 13:04 ..
drwxr-xr-x    2 1001     1001         4096 Jul  1 19:19 .ssh
drwxr-xr-x    4 1001     1001         4096 Jun 28 13:59 backups
sftp> cd ..
sftp> ls -l
drwxr-xr-x    4 1001     1001         4096 Jun 28 13:59 backups
sftp> ls -l /
drwxr-xr-x    4 1001     1001         4096 Jun 28 13:59 backups
sftp>

Excellent - SFTP works and won’t allow access to anything outside the home directory.

Note that user save has no valid password (and does not need one). The only way to get into the Raspberry Pi on this account is with a valid private key on the client machine.

That’s it. The Raspberry Pi server has been set up. Apart from occasional checks and security updates, there should be very little need to log into this box from now on. In fact, the root password can be disabled altogether, and the dietpi password should be set to a unique and secure password, known only to the administrator (better still: disable all passwords, only allow access via an SSH key).

The First Backup

With the server in place, the “only” remaining task is to set up Duplicacy on all the machines you want to periodically back up.

There are three client types: a command-line tool, available for several platforms (including Linux ARM, so this can also be very useful for backing up Raspberry Pi’s and such), and GUI-based wrappers for Mac and Windows. These latter require a small fee (which presumably keeps Duplicacy’s main developer well-fed, good-humoured, and highly motivated to support and extend the code forever).

Note that the GUI builds are essentially convenience wrappers around the command-line tool - they make life easier for day-to-day use, but you can set up and manage Duplicacy backups without them.

There is (currently) no GUI wrapper for Linux, and since the command-line tool is at the heart of everything Duplicacy does anyway, let’s start with that one.

Linux

The use and workings of the open-source Duplicacy core are described in considerable detail in the wiki on GitHub.

There’s a useful-but-generic Quick Start page, for example, and an important section about how to manage passwords on each client.

With the Raspberry Pi as SFTP backend, you will need two essential password settings, passed to Duplicacy via environment variables:

  • DUPLICACY_SSH_KEY_FILE - this should point to the location of the private SSH key file, to allow logins to the Raspberry Pi - it’ll almost always be set to $HOME/.ssh/id_rsa
  • DUPLICACY_PASSWORD - this is used to encrypt all your data before it gets sent out and stored on the Raspberry Pi - choose this password wisely, i.e. long, complicated, and unique!

Both keys should be kept absolutely private. If you are backing up more than one client to the same shared storage backend, then they will all need to use the same Duplicacy storage password. The SSH keys can differ, as long as they are all added to the authorized_keys file on the Raspberry Pi.

When initialising the Duplicacy client, your storage backend specifier should look something like:

sftp://save@<raspi-ip-address>/backups

In addition, you need to set the environment variables as follows, to avoid being prompted for this information. This won’t be crucial at first, but later on you’ll want to automate periodic backups via cron, so you might as well do it now:

export DUPLICACY_SSH_KEY_FILE=$HOME/.ssh/id_rsa
export DUPLICACY_PASSWORD='...'

Now go ahead, and follow the instructions in the wiki, running duplicacy init ... in the main directory you want to back up (probably your home directory). If all is well, this will have connected to the storage backend via SFTP, set a few things up, and created a local .duplicacy/ folder with some files in it. It’s all documented on the wiki.

Duplicacy will follow and back up the target of symlinks as well (but only in the top level, e.g. in your home dir). This offers a convenient way to include a few more areas in the same backup (if their permissions allow it).

It’s time to start the first backup. Note that this might take a (very) long time, i.e. hours or even days:

nohup duplicacy backup -stats &

By running it in the background with nohup, you can log out and come back later to check on its progress. The output will be in nohup.out and can be followed in real-time with the command tail -f nohup.out. Type CTRL-C to cancel the tail command (not the backup itself).

Once this works, you can automate periodic backups using cron. The easiest is to create a shell script which sets things up correctly, perhaps like this:

#/bin/sh
export DUPLICACY_SSH_KEY_FILE=$HOME/.ssh/id_rsa
export DUPLICACY_PASSWORD='...'
exec duplicacy backup -stats

Then add a line like this to cron, using crontab -e to start up the editor:

0 * * * * ./backup >backups.out 2>&1

As presented here, the $HOME/backup script will be launched once an hour. Adjust as needed.

One last note: be sure to enable pruning on one of the clients (it’s easier with the GUI), so that files which are no longer referenced in any snapshot get cleaned up. For command-line use, you can add a second crontab entry for the prune command once a day (and not at the same time as the backup).

Mac and Windows

There’s really very little to add, it’s all described in great detail in this guide.

One quirk worth mentioning, is that (at least on the Mac) you can’t always use Command+V to paste a string into the GUI’s text fields, such as long URLs and tedious passwords. The workaround is to left-click and select “Paste” from the pop-up menu instead.

Moving Backups Off-Site

The final step is to take the Raspberry Pi with its disk and move it off-site, e.g. to a friend’s place. The only requirement is a reasonably fast internet connection (at your place as well as his/hers). File transfers of 3 MB/sec correspond to ≈ 10 GB/hour and connection rates of ≈ 30 Mbit/sec.

Given that all initial backups have already completed locally, only changes to your client machines will be sent during backups - you may even be able to get away with slower connections. Full restores will of course require transferring potentially very large amounts of data, but a few files now and then would be quick. Your mileage may vary, as they say …

Note that with public internet transfers come greater security risks and more responsability. The nightmare scenario would be a security breach on one end leading to an escalation and security breach on the other end. But with the instructions so far, there is virtually no risk of this happening:

  • the storage backend is used as dumb filesystem: regardless of what is sent to it, the data won’t jeopardise the system (other than potentially filling up the Raspberry Pi’s disk, which is harmless)

  • SSH access to the Raspberry Pi is not possible for the save or root users, since neither will allow passwords - only the dietpi user might still have a password, and it’s only used for manual administraive tasks, so that (carefully chosen) password need not be stored anywhere

  • SFTP access is only possible via pre-registered SSH key pairs, and grants access only to the save user’s home directory on the Raspberry Pi

  • in other words: a compromised Raspberry Pi cannot affect anything around it, i.e. the friend’s local network cannot be reached, no matter what

  • similarly, since the Raspberry Pi is merely a “dumb” file server, it can’t mess with backup clients in any way - all that can happen is that it fails to operate properly as backend

With these concerns out of the way, it’s time to set up the off-site backups. This is simply a matter of shutting down the Raspberry Pi (of just unplugging it, if you don’t mind the extra file system check on next power-up), and hooking it back up at your friend’s place: power and a network cable, that is.

Your Raspberry Pi will presumably be located inside your friend’s LAN, behind the router / firewall, so the final step is to configure that router to open up an outside port and pass it through to port 22 on the Raspberry Pi (this is the port SSH uses). This sort of configuration is router-dependent.

Now you can return home and edit the .duplicacy/preferences files on each of your backup clients. Change the line specifying the storage backend to match the new setup - simply adjust the <raspi-ip-address> part as needed (either a domain name or a “dotted” IP address):

"storage": "sftp://save@<new-raspi-ip-address>/backups",

If the router’s outside-facing port is not 22, then add that as well:

"storage": "sftp://save@<new-raspi-ip-address>:<port>/backups",

Be careful to leave all the punctuation and commas intact, and all other lines in these preference files.

That’s it. Now all you need to do is to keep an eye on your backup clients’ log files from time to time, and check the remote disk status on the Raspberry Pi:

# if not on port 22, use: 
#   sftp -P <port> save@<new-raspi-ip-address>

$ sftp save@<new-raspi-ip-address>
Connected to [...].
sftp> df -h
    Size     Used    Avail   (root)    %Capacity
   4.5TB    1.0TB    3.5TB    3.5TB          23%
sftp> ^D
$

You’re done - enjoy the comfort of knowing that all your data is (and will remain) safely kept off-site!