Oct 20 2006

Amazon S3 for the Busy Worker Bee

Published by matt at 03:40 under

WARNING: This document assumes you are familiar with installing software, the notion of remote servers, and have some experience mounting remote servers and storage facilities for archiving or otherwise backing up your data. If any of that sounds completely bonkers to you, then it is possible the rest of this document will, too.

Amazon does more than sell books. Perhaps their most valuable assets are the information regarding the trends of book buyers. Reselling access to this information is one way Amazon makes money off the fact that you like to buy books from them. That said, Amazon clearly needs lots of computers and disk drives to store all the data they have regarding books, your interactions, and so on. Given that they have already invested in this infrastructure (in the form of hardware, people to manage that hardware, etc.), they have also begun to sell access to the very infrastructure they use to run their business.

For the busy worker bee, there is one Amazon service that should be very interesting: the Amazon Simple Storage Service (S3). In the simplest possible terms, Amazon is selling disk space. Now, they’re not selling 1GB of disk space, or 10GB… they’re selling you as much disk space as you would like to use. That might be a few megabytes, or it might be a few gigabytes. That’s important, because you only pay for what you use.

Pricing

Amazon has set a very simple pricing structure: you pay for the transfer of data to their servers, you pay to copy data off of their servers, and you pay to leave data on their disks. Their prices are per gigabyte. To copy 1 gigabyte of data to Amazon, you pay $0.20. To copy that same 1GB back from Amazon, they’ll charge you another $0.20. For every month that you leave that 1GB on their disks, they’ll charge you $0.15.

To put these numbers in perspective, I’m going to give you some concrete examples:

  • A 128MB memory stick

    Lets say you have a 128MB memory stick. It can hold (perhaps) a book you might be editing, or a dissertation you are writing, or whatever. Copying that entire 128MB to Amazon’s S3 service would cost you (0.128 * 0.20) cents, and it would cost you (0.128 * 0.15) cents per month to store it. That’s 3 cents to copy the data to their server, and 2 cents per month to store the data.

  • A CD

    A CD holds roughly 700MB of data. That would cost (0.7 * 0.20) cents to copy to Amazon’s servers, and (0.7 * 0.15) cents per month to store. Or, 14 cents to copy an entire CD to Amazon’s server, and 10 cents per month to store it.

  • A DVD

    I include a DVD worth of data only because you might want to store photos on Amazon’s servers. A DVD is roughly 5GB. This means it costs $1.00 to copy a DVD to S3, and it costs $0.75 per month to keep the pictures there.

I’ve beaten the horse quite badly, I’m afraid. But I think my point is made: this is the cheapest way to back up your data. I will belabor this point just a bit. If you go out and buy a hard drive, it will be cheaper. However, that hard drive will be in your house. If your house burns down, the drive is destroyed. If your house is burgled, the drive is gone. If the drive fails, you loose the data. So, now you need two drives… in two places. Your time is too valuable to be copying data from one drive to another, and driving it back and forth between home and your safe-deposit box in the bank. Given that Amazon already has multiple, redundant servers in multiple geographic locations (your data is in more than one place)… you simply cannot go wrong with Amazon’s pricing and services.

How to use it

If you’re still reading, and you agree with me, then you’d probably like to know how to take advantage of Amazon’s services. There are a number of ways, some more realistic than others.

  • Scripting

    Technically, the service is open, so you could write your own software (in Perl, Python, Scheme, or any other language) to access Amazon’s S3 service. While you might not, it is good to know that others can. This is not a closed service. It is privately owned, but anyone in the world can write software to interact with their servers.

  • “FTP”

    It is possible to use the program Interarchy to transfer files between your computer and the Amazon S3 service. I say “FTP” because it isn’t technically FTP. However, Interarchy makes it look like FTP, and works reasonably well. Interarchy costs around $50.

  • JungleDisk

    At the moment, this last option is free, but may cost money in the future. I’ll spend more time describing it below.

If you think that this might be the way for you to cheaply backup data, photos, and the like, then proceed. Keep in mind that, as time goes on, more and more people will be reselling Amazon’s services. This means, for example, that someone will come out with a simple way for you to do all the things I’m describing here, charge you a bit more than Amazon, and make a fortune. However, that ease of use (someday, who knows when) might be worth the extra pennies.

In the meantime, if you want to make use of Amazon’s S3 service yourself, you’ll need to take a few steps yourself.

  1. Register with Amazon

    You first need to register for the S3 service with Amazon. You’ll need to create a Web Services account, and in particular, activate the S3 service. This shouldn’t be too impossible a task—if you do it, and decide it was very hard, please drop me a note, and I’ll expand this section.

    You will need a credit card handy at this point, as Amazon wants a way to bill you every month for services rendered. Remember, if you register, and don’t use it, you won’t be billed… so you are simply promising to pay for the service you use, not some flat rate, etc.

  2. Obtain your Access and Secret Keys

    When you are subscribed, you’ll be able to view your Access Key and Secret Keys for the S3 service. FOR THE LOVE OF ALL THINGS GOOD, DON’T LOOSE THESE OR GIVE THEM AWAY. Really. Don’t send it in an email to a friend in plain text, don’t keep it in plain text on your laptop… if you want your data on Amazon’s servers to even be partially secure, don’t loose or release that key into the wild. Really. Honest.

  3. Download JungleDisk

    Now download JungleDisk for your platform (Windows or Mac). After downloading, install.

    NOTE: This program is currently free, but might become non-free sometime in the future. Be prepared, that if you like this solution to backing things up, you may have to pay for the software. (Personally, I’m all about supporting independent software developers, so if the author wants $20 or so for his software, and it makes my life better, great.)

  4. Follow the instructions

    There are instructions to help you out in terms of installation and use. Read them.

Once you have JungleDisk installed and working, you should have a new drive icon somewhere in your system. Under windows, you might have a new drive mapped in the Explorer; under Mac OS X, it mounts it as a new network drive (much like .Mac or other remote servers).

NOTE: Security? You can enable security in JungleDisk, and it will encrypt your data before shipping it to Amazon. This is not terribly robust, in the long run, but it is good enough that someone snooping the data in-between cannot read the data. I recommend you enable this, even for the casual user. However, in doing so, MAKE SURE YOUR SECRET KEY IS SAFE. If you loose it, you won’t be able to decrypt the data.

Backup for Brainiacs

Note: this section is Mac specific. I don’t own a Windows computer, and I don’t intend to ever own one again. If you need help, at this point, with Windows-specific concerns, I’ll happily obtain a machine and do the work to help you… but I will charge you significant amounts of money in consulting and training fees.

At this point, you have a lot of options with respect to how you use your S3 account.

One thing you can do is simply drag files and folders onto the JungleDisk manually. By doing this, it will copy them to Amazon’s servers. This is OK; I mean, you get to selectively decide where things go, and what to back up. But what most people don’t know is that irregular, manual backups are usually no good. What you want is something mostly automatic, so you can (with the press of a button) backup everything that matters to you.

What I recommend is that you investigate one of three programs:

  1. PsyncX, David Baker

    PsyncX is free/open-source software. It builds on UNIX tools for scheduling and performing backups. It will allow you to choose source and destination folders (like MimMac), and run them (potentially) on a regular basis. For example, you can have PsyncX automatically back up your important documents every night. The nice thing is, it will synchronize, so if a file hasn’t changed, it won’t copy it. Again, you’ll need to investigate it more, on your own, to decide if this is for you.

  2. MimMac, Ascendant Softworks

    I’ve not used MimMac before, but it is in the space of programs that you might consider. It lets you define source folders and destination folders, and it will backup or synchronize those folders for you. You’ll need to read more about it specifically to see if it works for you.

The point of programs like these is that you should be able to (1) press a single button, and (2) have one or more directories on your computer automagically copied to your your S3 account by way of JungleDisk. Software like PsyncX will make sure that the version it is copying is newer (or modified) than the version on your S3 account, and only copy things over if they have chagned. This way, you don’t have to go digging and say “What files did I change today?” With almost 100% certainty, you’d miss one.

There are other backup programs out there. You may find the ones I’ve listed are not to your liking, or you might not like the idea of backup at all. However, if your backups are not (a) one-touch, or (b) automatic, they’re not good. Or, put another way, they’re unreliable, because you are unreliable. I know, I know, you have a mind like a steel trap… but trust me, your trap might become rusted, and then you can’t be trusted.

That’s all, folks

I’ve moved quickly through some bits, slowly through others, but in all I’ve tried to cover things end-to-end. Good backups are hard to come by, and this is something that, if it matters to you, you’ll need to work at. I’ve spent hours working on a robust backup strategy for a server I help manage, and I know that every hour I spend on that backup strategy is ten hours saved in recovery. In fact, some data I have can’t be recovered, no matter how hard I work. Photos? Old documents? Email? These things need to be backed up regularly, off-site, and with care if they’re going to be available to you after machine theft or failure.

Please feel free to leave comments and feedback below; I’ll evolve this note as people point out flaws, omissions, or desirable additions.

4 Responses to “Amazon S3 for the Busy Worker Bee”

  1. elephant.org.ilon 02 Nov 2006 at 17:31

    Other News and Articles…

    Data Storage and Backup Solution for Pennies not $s Matt
    Jadud describes where you can store and recover large amounts of data
    over the Internet, and pay only for the space you use. It sounds like a
    great solution for a freelance Tech Writer that do…

  2. [...] I previously wrote about Amazon S3 and its use for people who just need a way to cheaply/safely backup and archive content off-site. This came up at a CE-L dinner I was attending with my wife some time ago; people were discussing the relative costs of backup solutions for their work. I’ll be updating this, as JungleDisk has recently been updated with some new features that I think make it a complete no-brainer for use as a backup solution for Normal People Like Us. [...]

  3. [...] I previously wrote about Amazon S3 and its use for people who just need a way to cheaply/safely backup and archive content off-site. This came up at a CE-L dinner I was attending with my wife some time ago; people were discussing the relative costs of backup solutions for their work. I’ll be updating this, as JungleDisk has recently been updated with some new features that I think make it a complete no-brainer for use as a backup solution for Normal People Like Us. [...]

  4. Sub Ubi » The Busy Writer: Backupson 13 Feb 2007 at 04:14

    [...] Another way to do online backup is with Jungle Disk. I’ve written about this previously. You can follow those instructions to get Amazon S3 setup. And, you can use it with automatic backup software just like Bingo!. The difference is that you only pay for what you use with Amazon S3, whereas you pay for a big chunk of space on Bingo, and get charged for it whether there is data in it or not. [...]

Trackback URI | Comments RSS

Leave a Reply