Datastore inremental backup script for App Engine

Sunday, 6 February 2011

I'm a developer using App Engine as a back-end for my Android App - Chesspresso. I love App Engine and it has come a long way since it was incubated a few years back and for Chesspresso it has been a lifesaver in terms of scalability. But one critical thing thats missing currently though is datastore backup.

Here is a python script that I use to backup my app datastore in a cron job on remote servers. It creates incremental backups of the GAE datastore zipped up with logs, and it'll mail you if the shit hits the fan. Its worked for me nicely so far. You'll have to go through it with a fine tooth comb as there is a lot of variables to tweak for your own uses.

BTW, You'll have you install the remote_api servlet in your app for this to work...

import os
import shutil
import subprocess
import sys
import time
import zipfile
import glob
from zipfile import ZipFile
import traceback
import string
import smtplib
import StringIO

# Folders
FOLDER_ROOT = "/keep-everything-in-this-folder";
FOLDER_TEMP = FOLDER_ROOT + "/tmp";
FOLDER_BACKUPS = FOLDER_ROOT + "/backup";
FOLDER_GAE = FOLDER_ROOT + "/gaep";
SERVER_SMTP = "localhost";

# Other
GAE_APPNAME = "your-gae-app-name";
GAE_URL_API = "/remote_api";
FOLDER_NAME_TEMP = "gaebak";
FILE_NAME_APPCFG = "appcfg.py";
FILE_NAME_INPUT = "input.txt";
FILE_NAME_ARCHIVE = time.strftime("%y%m%d-%H%M%S") + ".zip";
ERROR_EMAIL = "your-email";

# max number of archives before deletion
MAX_ARCHIVES = 50;

# Arguments
ARG_URL = "--url=http://" + GAE_APPNAME + ".appspot.com" + GAE_URL_API;
ARG_APP = "--application=" + GAE_APPNAME;
ARG_LOG = "--log_file=log-";
ARG_PASSIN = "--passin";

# Full Paths
F_TEMP = os.path.join(FOLDER_TEMP, FOLDER_NAME_TEMP);
F_APPCFG = os.path.join(FOLDER_GAE, FILE_NAME_APPCFG);
F_BACKUP = os.path.join(FOLDER_BACKUPS, GAE_APPNAME);
F_ARCHIVE = os.path.join(F_BACKUP, FILE_NAME_ARCHIVE);
F_INPUT = os.path.join(FOLDER_ROOT, FILE_NAME_INPUT);

def removeTempFolder():
    if os.path.exists(F_TEMP):
        shutil.rmtree(F_TEMP);
    return;

def createTempFolder():
    if os.path.exists(F_TEMP) is False:
        os.mkdir(F_TEMP);
    return;

def createBackupFolder():
    if os.path.exists(F_BACKUP) is False:
        os.mkdir(F_BACKUP);
    return;

def call(type, extra):
    p = subprocess.call([sys.executable, F_APPCFG, ARG_PASSIN, ARG_URL, ARG_APP, extra, type], stdin=file(F_INPUT));
    return;

def init():
    removeTempFolder();
    createTempFolder();
    createBackupFolder();
    os.chdir(F_TEMP);
    return;

def backup():
    call("--filename=db.dump", "download_data");
    return;

def archiveBackup():
    zip = zipfile.ZipFile(F_ARCHIVE, "w", zipfile.ZIP_DEFLATED);
    
    for name in glob.glob1(F_TEMP, "*"):
        zip.write(os.path.join(F_TEMP, name), name);
        
    zip.close();
    return;

def pruneArchives():
    files = glob.glob1(F_BACKUP, "*.zip");
    files.sort(reverse=True);
    num = len(files);
    
    for i in range(MAX_ARCHIVES, num):
        os.remove(os.path.join(F_BACKUP, files[i]));
    
    return;

def sendMail(trace):
    emailFrom = ERROR_EMAIL;
    emailTo = [ ERROR_EMAIL ];
    emailSubject = "[GAE Backup] ERROR occurred while running backup script: (" + F_ARCHIVE + ")";
    emailBody = trace;

    body = string.join(("From: %s" % emailFrom, "To: %s" % emailTo, "Subject: %s" % emailSubject, "", emailBody), "\r\n");
    server = smtplib.SMTP(SERVER_SMTP);
    server.sendmail(emailFrom, emailTo, body);
    server.quit();
    return;

# Execute
try:
    init();
    backup();
    archiveBackup();
    pruneArchives();
except Exception:
    sendMail(traceback.format_exc());
    traceback.print_exc(file=sys.stdout);

You'll also see 'input.txt' mentioned up at the top there. That is a file which stores your user/pass for authorization. Its simple and it looks like this:

mygmailaddresonthefirstline@gmail.com
mygmailpasswordonthe2ndline

Sorry all, probably not the most elegant of technical blog posts, but hopefully it'll come in handy for someone else. As far as I know this is compatible with Linux, but not tried on any other OS.

4 comments:

Pablo Antonio said...

Uhmm... How is it incremental?

Eurig Jones said...

archiveBackup() function generates a new archive with the date/time as the filename.

pruneArchives() removes old backups if there more than a certain of them.

Kyle Baley said...

But is it incremental in the sense that it will download only data that's changed since the last backup? I think that's typically what incremental means with respect to database backups.

Eurig Jones said...

Yes that's true, its probably not the correct term here. But it does exactly what I need it to do, and that is t create regular backups of the datastore.