Wednesday, March 30, 2011

Creating large in-memory database on AppEngine

For high performance AppEngine app development, it often comes in handy to store read-only database in memory. Accessing in-memory data is not only orders of magnitude faster than making DataStore or memcached calls but also a lot cheaper as it does not incur any API cost.

However, if you encode data as a Python dictionary or list with a million entries, your app will most likely crash on AppEngine, throwing a distasteful exceptions.MemoryError: Exceeded soft process size limit with 299.98 MB. "But I'm only loading 10MB of data!", you proclaim. Unfortunately, Python may temporarily consume over a gigabyte of memory while parsing and constructing a multi-megabyte dictionary or list.

The first thing you should consider is to simplify the data structure. If possible, flatten your database into one-dimensional lists, which enjoy a smaller memory footprint than dictionaries and multi-level nested lists.

Next, try data serialization using the pickle library. Be sure to use protocol version 2 for maximum efficiency and compactness. For example:
# To serialize data
pickle.dump(data, open('data.bin', 'w'), pickle.HIGHEST_PROTOCOL)
# To deserialize data
data = pickle.load(open(os.path.join(os.path.dirname(__file__), 'data.bin'), 'r'))

As AppEngine does not support the much faster cPickle module ("cPickle" is aliased to "pickle" on AppEngine), your app may time out if you try to unpickle millions of records. One effective solution is to store your data in homogeneous arrays to take advantage of array's highly efficient serialization implementation. Suppose you have a list of a million signed integers, you may first convert the list into a typed array and save it in a binary file:
array.array('i', data).tofile(open('data.bin', 'w'))
Deserializing the array literally takes just a few milliseconds on AppEngine:
data = array.array('i')
data.fromfile(open(os.path.join(os.path.dirname(__file__), 'data.bin'), 'r'), 1000000)

One caveat: To load more than 10MB of data, you will have to split the database into multiple files to work around AppEngine's size limit of static files.

1 comment:

  1. By utilizing Google App Engine, you can run your application large and in charge best framework. Your application keeps running inside its own particular secure, dependable condition that is free of the equipment, working framework, or physical area of the server.According to this top class mobile app development company it has every one of the capacities and usefulness of MySQL, with a couple of extra highlights and a couple of unsupported highlights as recorded underneath.