I’ve been doing some prototyping work to see how suitable MongoDB is for replacing a small (in number, not size) cluster of MySQL servers. The motivation for looking at MongoDB in this role is that we need a flexible and reliable document store that can handle sharding, a small but predictable write volume (1.5 – 2.0 million new documents daily), light indexing, and map/reduce operations for heavier batch queries. Queries to fetch individual documents aren’t that common–let’s say 100/sec in aggregate at peak times.
What I’ve done so far is to create a set of Perl libraries that abstract away the data I need to store and provide a “backend” interface to which I can plug in a number of modules for talking to different data stores (including some “dummy” ones for testing and debugging). This has helped to clarify some assumptions and change a few of my early design ideas. As you may remember from I Want a New Data Store, I’m interested in trying a few options.
Then I created another set of modules that can read the existing data from the MySQL tables, joining as necessary, and normalizing some field names as well as removing “empty” fields. (I’ve decided that it’s silly to store “NULL” values in MongoDB, since we still have to store the key even when there’s no value.) That code has allowed me to do some performance testing and, more importantly, sizing tests. I know have a pretty reliable estimate of our storage needs, assuming the data I’ve pulled so far is representative of the future. It looks something like this:
1GB = ~ 300,000 items (0.3M) 10GB = ~ 3,000,000 items (3M) 100GB = ~ 30,000,000 items (30M) 1TB = ~ 300,000,000 items (300M) 10TB = ~ 3,000,000,000 items (3B)
This gives me a good way to judge how much space and how many machines we’ll end up needing.
Performance has be very good so far. The bottleneck is clearly getting the data out of MySQL. Because of the scattered nature of the content in the existing system, and the number of queries and joins required, it’s possible to pull at most 50-100 documents per second. Given that we have nearly a billion documents to migrate, it’ll take some time (and a few TB of space). I definitely plan to distribute that work and take advantage of there being multiple copies of the data around.
There are two limitations (or missing feautres) that I’ve encountered so far in MongoDB. The first is a lack of compression. For this particular data set, I’m reasonably confident that both gzip and lzo could easily get somewhere from 3:1 or 5:1 compression. That could mean a substantial space savings (several TB of disk) at a very small CPU cost (and modern CPUs are very fast). The state of compressed file systems in Linux is sad, so the only real hope in the short or medium term is likely MongoDB object compression as requested in SERVER-164, which I have voted for and commented upon.
The second issue I’ve encountered is the 4MB object size limit. SERVER-431 discusses this a bit and I’d love to see some more discussion and voting there. My motivation for this is to allow for sequential reads of larger objects (think of them as something like covering indexes from the relational world) so that they don’t need to be chunked up (think GridFS) and force me to incur disk seeks when retrieving them. Amusingly, I was hoping to store lists (or arrays) in some documents using the $push operator when I ran into this. It didn’t take long to also find SERVER-654 and realize that I wasn’t the only one.
My ultimate wish for the 4MB limit would be a configurable limit so that the DBA or programmer can choose what’s right for their dataset. In my case, I went back and did some more analysis to see how many records would be affected. It turns out that it was a very small percentage of the total data. But the outliers do exist and since there’s a limit, I’d have to work around it anyway. That’s still code I need to write, test, and support. This caused me to think about the problem a little differently and consider other solutions too.
I need to do a bit more work on the storage side and some more testing. The next major piece of this system is an indexing piece that may not end up living inside of MongoDB. I’ll write more about that later, but my current thinking is to use Sphinx (which we already know and love). This may give me an excuse to play with two new Sphinx features I’ve been itching to try: string attributes and real-time indexing (via the MySQL protocol).
The next MongoDB piece I want to play with is sharding and replication. I’m hoping to get a feel for the strengths and weaknesses, as well as figuring out the right deployment strategy for us if we end up using MongoDB. Thankfully, our new development environment just got setup last week and there’s enough hardware that I can experiment with different topologies to see how it’ll work out.
I’ll write more about both the Sphinx and MongoDB experimentation as it progresses.