The Backup Sermon

Above: People imagine similar to this when they hear “cloud backup,” an admittedly celestial sounding notion. Like the bald child in The Matrix, I must warn, “There is no cloud,” there are only servers in other countries that you rent space on. Image by Oleksiy Mark/DepositPhotos.

BitDepth#1034 for March 29, 2016

It seems oddly relevant, on the Tuesday after one of the most sacred weekends on the Christian calendar, to be talking about backup.

It’s a text I return to occasionally, sometimes preaching from a place of deep and unreplaceable loss, but like the crucifixion and resurrection, it’s a narrative that never seems to get old. Or particularly comfortable, come to think of it.

I’ve written this before, and I’ll write it again. There are only two kinds of computer users. Those who have lost data and those who will lose data.

If you think the human spirit is easily broken, have a look at the razor thin tolerances and impossibly small mechanisms inside your hard drive.

What’s shocking about drive failures isn’t that they happen, it’s that they happen with acceptably minimal frequency.

You may rest assured, however, that drives, whether mechanical or solid state, will eventually fail and you have no assurance that such failures will come with any forewarning at all. I run software on my server that monitors each drive’s SMART status continuously, but that didn’t help with any failures.

My paranoia about these matters is pervasive, absolute and ongoing. Painful experience has taught me that backup should never be considered as a matter of duplicating files; it must begin from the perspective of recovery.

Your strategy must begin by envisioning the horror of losing everything on your computer. The effectiveness of your backup strategy will be measured by how long it takes you to first, get back to work and then to restore all your data.

My own system has four active tiers.

Everything on my imaging workstation/server is continuously backed up using the built-in Macintosh incremental backup service, Time Machine.

To create a big enough drive for that, I’ve taken the risky measure of creating a striped raid, which creates a volume that’s twice as likely to fail, so it’s a safety net made of feebler strands than I’d like.

I’ve already recovered a 2TB internal drive from a Time Machine backup (the server hosts seven different drives internally), which took around a day.

The next tier is a duplicated pair of drives holding files that have moved from active use to archival status that sit in an external drive box (usually switched off until needed) on my desktop.

The third tier is a drive that gets updated annually (it’s physically switched with a newer, more current drive) with all of my digital images to date that sits in my sister’s house in Houston.

The next two tiers have proven far more elusive.

The fourth tier was meant to be a collection of optical disks which would have moved the images to write once media, which has some advantages over mechanical drives.

That strategy migrated from DVD data disks to Blu-Ray in short order, but while both media and drives are in place, the process demands a level of perfected organisation in filing that I haven’t been able to muster either the time or energy to hammer into place.

So I’ve skipped to the fifth tier, motivated by slowly rising upload speeds available from local broadband providers.

That solution is a cloud based strategy, moving critical files to an overseas location over an Internet connection.

To be clear, I am currently managing a 4TB dataset of master files and derivative files with thousands of hours worth of corrective work invested in them. This is actually a fairly small overall dataset to work with for anyone who works with RAW files as originals and retains TIFF images as masters for corrected images.

I know wedding photographers who handle much larger collections and once you start replicating across multiple drives for redundancy, things quickly escalate to enterprise class data solutions.

The eternal dilemma is to get maximum redundancy for minimum cost with ready access to critical files that are kept up-to-date with frequency.

For online backup, I’d investigated Backblaze and CrashPlan, both popular with my colleagues, but eventually chose Amazon Glacier, the most cost-effective online data storage I’ve found so far.

To access it, I’ve been using Arq, a cross platform backup tool created by a cloud service company that performs incremental backup to a range of other services, adding new files to an existing data storage pool after an initial backup with excellent interruption recovery.

The inital backup took around a month, and I can add around 50GB of data overnight.

Amazon Glacier is “cool” storage. Transfer rates are slower than the more popular S3 storage services and the company makes its money off of Put, Get and List requests, charging as you add or download information from their servers.

Left alone during February, my charges for 4TB dropped to US$15 for the month though it cost US$40 per month during the initial upload.

Immolation level data loss (turns to knock wood desperately) sees me requesting the backup drive from Houston via UPS while accessing ongoing work from Glacier.

The pieces are in place and test well. I really hope they never need to be put into action.