Backup Your Dropbox Files with rdiff-backup

The Problem

Teresa and I use a single Dropbox account to share files between our computers. I also use the same account to store (and sync) plain-text notes on my iPad and iPhone (I use the apps PlainText and iA Writer). In case things go wrong with these apps, the syncing, or with Dropbox itself, I want to backup my Dropbox files and keep past snapshots of the backups so I can go back in time.

The Solution

rdiff-backup can do this. It is a command-line tool written in Python that:

…backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.

To make this all happen, I have Dropbox installed, signed-in, and running on my Linux desktop/server, which runs Ubuntu 11.04 Natty with Gnome 2.

Install rdiff-backup thusly:

    # aptitude install rdiff-backup

I use the directory /backup/ to hold all my backup targets, so I can run rdiff-backup like this:

    $ rdiff-backup  \
        --exclude $HOME/Dropbox/.dropbox \
        --exclude $HOME/Dropbox/.dropbox.cache \
        $HOME/Dropbox /backup/Dropbox

Every time I run rdiff-backup like this, it creates a new snapshot of my Dropbox files. Old snapshots are kept until I decide to purge them (if at all). To purge any snapshots older than two months, for example, I run this command:

    $ rdiff-backup --force --remove-older-than 2M /backup/Dropbox

I run the above two commands in an @hourly crontab script to keep this all happening automatically.

Browsing Past Snapshots

rdiff-backup has its own commands for digging into the files stored in the past snapshots, but it requires exactly knowing the filenames and backup times. Another tool, rdiff-backup-fs solves this problem by mounting the rdiff-backup backup directory as a FUSE filesystem, allowing me to grep and find my way through a directory tree of all snapshots.

After installing FUSE and rdiff-backup-fs, I mount my Dropbox snapshot tree with this command:

    $ rdiff-backup-fs ~/mnt /backup/Dropbox

Note that the order of the arguments for mounting source and target are backwards compared to the canonical mount command.

A long listing of my 10-oldest snapshots looks like this:

    $ ls -lF ~/mnt/ | head -10
    total 0
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T05:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T06:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T07:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T08:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T09:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T10:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T11:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T12:00:01/
    dr-xr-xr-x 1 root root 4096 2013-03-10 13:52 2013-01-06T13:00:01/

I can then explore all my snapshots at once with any tools wish.

When done, I unmount the rdiff-backup-fs filesystem with:

    $ /bin/fusermount -u ~/mnt

Ubuntu + Python + BitTorrent trouble

I’m wondering if anyone else has experienced trouble with the Python that comes installed with Ubuntu when trying to use the official Python BitTorrent client. (Note that Ubuntu comes with a Python BitTorrent client, but it’s completely different than the official one. Seems like a rewrite. I don’t know its origin.) Anyway, the official btdownloadcurses.py eventually hits some error that screws up the screen drawing, and btlaunchmany.py eventually stops working and begins sucking up 100% of the CPU.

I’ve seen lots of discussion regarding different clients and different OSes with regard to the 100% CPU usage. But I don’t think this is related. Now my intuition told me that the fault lies with Ubuntu’s Python installation. The above problems occur under Hoary and Breezy, and I’m using Python 2.4.2 of the Ubuntu base installation and BitTorrent 4.0.4 downloaded from the BitTorrent website.

So in an experiment last night, I grabbed the sources from www.python.org and compiled my own Python 2.4.2 from source. So far, the official BitTorrent client hasn’t cracked or croaked when running under this Python. So there is something weird about Ubuntu’s Python installation. I wonder what’s different about it.

Update: Well, I spoke too soon. After two days of running soundly, the client gave up the ghost just like before. So, now what? I guess I’ll go back to FreeBSD.

Working with vCalendar

I couldn’t sleep last night, so I used my time to work on putting Moon phase information for the new year into my Palm’s calendar. I already have a Python script that will calculate the times of the phases of the Moon based on the algorithms found in Jean Meeus’ Astronomical Algorithms (2nd Ed., 1998). But I still needed a way to import this data into my Palm in an automated way. Last year, I did this by hand. Yuck! So I used the Palm Desktop application and the vCalendar file format for exchange of calendaring and scheduling information. I had never used the vCalendar format before, but I found the specifications online and soon modified my Python code to output into this format. Here’s an example of what I produced:

BEGIN:VCALENDAR
VERSION:1.0
BEGIN:VEVENT
DTSTART:20050103T174530Z
SUMMARY:Last Quarter 01h45
END:VEVENT
BEGIN:VEVENT
DTSTART:20050110T120238Z
SUMMARY:New Moon 20h02
END:VEVENT
...
END:VCALENDAR

Note that the times DTSTART are given in UTC (the “Z” is for “zulu”), but I wanted the event description to be in local time (8 hours ahead for China). Since China doesn’t observe daylight savings time, I could apply this +8 h correction into the Python code quite trivially.

If you want to see the code that I used to do this, just email me.

Pigs and Cows and a Tiny Little Server Farm

There’s a weblog that I like to read on a regular basis. It’s called Pork Tornado, and the guy, Dusty, is hilarious most of the time. Very dry humour.

But I’ve had lots of trouble connecting to the site, with the domain name system (DNS) unable to find the site all the time. I’ve tried from work, at home, and even from my computer in Canada. Sometimes I can load the site, sometimes I can’t. I honestly thought that their DNS entries were misconfigured and not refreshing themselves on the Internet properly. I wrote a Python script, even, to regularly look up the site’s DNS information to track the history of the intermittent problem. I had the script running on the above-mentioned three machines.

But just two days ago, I figured out the real problem: it was me all along. All this time I’ve been typing

http://porktornado.dairyland.com/

instead of

http://porktornado.diaryland.com/

Can you see the difference? Now try the links. The first one doesn’t work; the second one does. Now here’s my explanation.

I honestly thought that the site was related to dairies (cows) and it never crossed my mind that the key word was “diary” (writing in a journal). How dumb. Besides, the website is a bit cartoonish, so I just assumed “cows” again, cause cows are oftentimes drawn as cartoon characters. Furthermore, I had been really careful to spell “porktornado” correctly because it is a difficult word to type and I didn’t want to accidentally type “toronado”. There you have it.

Now that I’ve got it right, I really need to just make a bookmark of the dumb site.

Loopy Weather

If you’ve been using my weather page recently, then maybe you’ve been wondering what’s going on. Like, why are days repeating themselves? It’s producing some interesting graphs, like the following:

Well my script actually broke a few days ago because the maintainer of the EAS weather station dropped a few variables from the output of the station. So, I noticed this and promptly fixed the script. But it seems that for some reason, the timestamps on the data are no longer working right. So my script, which relies on their data, happily takes the timestamps and sends them to gnuplot. The cool thing about gnuplot is that it will take date and time information as a variable and stick the data points in the right place. So, in a case like this, where the timestamps are messed up, you get interesting plots like the one above. I’m not sure if the maintainer of the EAS weather station will notice this problem since they only plot the last 24 hours on their web page. But it’s going to mess up their archives of the data.

Now I have been working on a pure Python implementation of my weather scripts which relies on METAR reports from airports. It’s working well, and since I’ve started using the PyMETAR module, I’ve been able to generalize my scripts to any airport around the world. But so far, it’s still in the testing phase and hasn’t gone online yet.

Post Holiday Rest and Relaxation

Well, Lily and I had an excellent holiday. We visited family in NE China, wore lots of clothes to keep warm, ate really well, and enjoyed the fireworks and the stars. (In fact, back in Beijing, people are still setting off fireworks for the rest of the week.) It was a great experience for me to celebrate the Spring Festival in China. One fun moment was buying ice cream bars from a street vendor at -20°C! At that temperature, a cardboard box is all you need for refrigeration.

Since coming home to Beijing, I’ve rested lots, and been working at home for the week since the office is closed. (I need to make up for all the work I missed since I took my holiday earlier than everyone else.) But I got it all done and it’s Friday night. I’m looking forward to going out and watching the sunset and then maybe I’ll go out for pizza.

Also this week, for some reason, I’ve gone crazy playing and hacking around in Python. Maybe I missed my computer lots on my holiday. (I think so!) I wrote a command-line equivalent to the Unix cal(1) command which prints calendars like this for any month and/or year:

   February 2003
Su Mo Tu We Th Fr Sa
                   1
 2  3  4  5  6  7  8
 9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28

It’s not that big of a deal though since Python has this as a built in module. But it needed a command-line interface, so now my windows box has a cal command. I’m also working with a full-text indexer to index all the textfiles (and email messages) on my computer to aid with searching for stuff I’ve archived long ago. I am also planning to write a simple equivalent to the GNU locate command, to just help find locations of files (not their contents) on my computer.

Of course, when I come up with something useful, I need to stick it on my webpage to share with the world. Fun, fun, fun!