<- Jacob Vorreuter

Archiving With AWS Glacier

I recently spent the better part of my holiday “archiving”. In my case this meant first deleting about 80% of the photos I’d taken over the last 8 years and then figuring out a smart way to not lose those remaining “good” photos.

Backup solutions I’ve tried in the past included burning photos to DVDs, backing up locally with an Apple Time Capsule, and most recently, syncing to THE CLOUD (I tried ZipCloud).

Any local solution is just not durable enough. Cloud services like ZipCloud are cool and very easy to configure, but they offer features I don’t need for a price I’d rather not pay.

So why not roll your own solution with Amazon’s insanely cheap Glacier deep storage solution?

Amazon Glacier is an extremely low-cost storage service that provides secure and durable storage for data archiving and backup. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.

That’s perfect. I’d gladly trade accessibility for lower cost. At the time I’m writing this, Glacier storage in the us-east region costs $0.01 per GB. I can store my 20 GB of photos for $2.40 per year.

Overview

We’ll setup a directory locally that we want to backup daily. A cron job running the aws s3 sync command is sufficient to keep your local directory up-to-date with an s3 bucket. Once you configure a lifecycle rule on your s3 bucket, it’ll automatically migrate new contents over to Glacier.

I’m using a Raspberry Pi with an external hard drive for local storage. I configured the external hard drive as a network drive accessible via samba. This allows me to connect to it from my laptop and easily transfer photos to the archival directory.

Setup

First, create a new user to own the archive data and run the sync job.

1
2
3
sudo useradd archives -m -G users
sudo passwd archives
sudo chown archives:archives /mnt/archives/images

Then, install the AWS client.

1
2
sudo pip install awscli
sudo pip install --upgrade awscli

Login to your AWS console and create a new IAM user.

Next, create an s3 bucket and grant your IAM user access with a Bucket Policy like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111111111:user/ARCHIVEUSER"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::BUCKETNAME"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111111111:user/ARCHIVEUSER"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::BUCKETNAME/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111111111:user/ARCHIVEUSER"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::BUCKETNAME/*"
        }
    ]
}

Remember to replace 111111111 with your account ID, ARCHIVEUSER with your new IAM user and BUCKETNAME with your new s3 bucket name.

Also, add a lifecycle rule to your bucket to ensure that its contents are migrated to Glacier:

Now you’re ready to configure your AWS client. Have the access key and secret for your new IAM user handy.

1
aws configure

Test out the sync command with the --dryrun option.

1
aws s3 sync /mnt/archive/images/ s3://BUCKETNAME/ --dryrun --region us-east-1

You may want to include the --delete option in your command if you’d like to have s3 files deleted to match local deletes.

You may also want to consider excluding hidden files or files with a certain extension. I use the following exclude options for hidden files:

1
--exclude ".*" --exclude "*/.*"

When you’re happy with your command options, add a line to your /etc/crontab file:

1
*  8    * * *   archives aws s3 sync /mnt/archive/images/ s3://BUCKETNAME/ --region us-east-1

This runs the sync job at 08:00 UTC everyday (midnight PST).

« Raspberry Pi surveillance