Jeremy Zawodny's blog

1,250,000,000 Key/Value Pairs in Redis 2.0.0-rc3 on a 32GB Machine

July 25, 2010 · 6 Comments

Following up on yesterday’s 200,000,000 Keys in Redis 2.0.0-rc3 post, which was a worst-case test scenario to see what the overhead for top-level keys in Redis is, I decided to push the boundaries in a different way. I wanted to use the new Hash data type to see if I could store over 1 billion values on a single 32GB box. To do that, I modified my previous script to create 25,000,000 top-level hashes, each of which had 50 key/value pairs in it.

The code for redisStressHash was this:

#!/usr/bin/perl -w
$|++;

use strict;
use lib 'perl-Redis/lib';
use Redis;

my $r = Redis->new(server => 'localhost:63790') or die "$!";

## 2.5B values

for my $key (1..25_000_000) {
	my @vals;

	for my $k (1..50) {
		my $v = int(rand($key));
		push @vals, $k, $v;
	}

	$r->hmset("$key", @vals) or die "$!";
}

exit;

__END__

Note that I added a use lib in there to use a modified Redis Perl library that speaks the multi-bulk protocol used all over in the Redis 2.0 series.

If you do the math, that yields 1.25 billion (1,250,000,000) key/value pairs stored. This time I remembered to time the execution as well:

real	160m17.479s
user	58m55.577s
sys	5m53.178s

So it took about 2 hours and 40 minutes to complete. The resulting dump file (.rdb file) was 13GB in size (compared to the previous 1.8GB) and the memory usage was roughly 17GB.

Here’s the INFO output again on the master:

redis_version:1.3.16
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:21426
uptime_in_seconds:12807
uptime_in_days:0
connected_clients:1
connected_slaves:1
blocked_clients:0
used_memory:18345759448
used_memory_human:17.09G
changes_since_last_save:774247
bgsave_in_progress:1
last_save_time:1280092860
bgrewriteaof_in_progress:0
total_connections_received:22
total_commands_processed:32937310
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
db0:keys=25000000,expires=0

Not bad, really. This provides a slightly more reasonable usse case of storing many values in Redis. In most applications, I supsect people will have a number of “complex” values stored behind their top-level keys (unlike my previous simple test).

I’m kind of tempted to re-run this test using LISTS, then SETS, then SORTED SETS just to see how they all compare from a storage point of view.

In any case, a 10 machine cluster could handle 12 billion key/value pairs this way. Food for thought.

→ 6 CommentsCategories: nosql · programming · tech

200,000,000 Keys in Redis 2.0.0-rc3

July 24, 2010 · 8 Comments

I’ve been testing Redis 2.0.0-rc3 in the hopes of upgrading our clusters very soon. I really want to take advantage of hashes and various tweaks and enhancements that are in the 2.0 tree. I was also curious about the per-key memory overhead and wanted to get a sense of how many keys we’d be able to store in our ten machine cluster. I assumed (well, hoped) that we’d be able to handle 1 billion keys, so I decided to put it to the test.

I installed redis-2.0.0-rc3 (reported as the 1.3.16 development version) on two hosts: host1 (master) and host2 (slave).

Then I ran two instances of a simple Perl script on host1:

#!/usr/bin/perl -w
$|++;

use strict;
use Redis;

my $r = Redis->new(server => 'localhost:63790') or die "$!";

for my $key (1..100_000_000) {
	my $val = int(rand($key));
	$r->set("$$:$key", $val) or die "$!";
}

exit;

__END__

Basically that creates 100,000,000 keys with randomly chosen integer values. They keys are “$pid:$num” where $pid is the process id (so I could run multiple copies). In Perl the variable $$ is the process id. Before running the script, I created a “foo” key with the value “bar” to check that replication was working. Once everything looked good, I fired up two copies of the script and watched.

I didn’t time the execution, but I’m pretty sure I took a bit longer than 1 hour–definitely less than 2 hours. The final memory usage on both hosts was right about 24GB.

Here’s the output of INFO from both:

Master:

redis_version:1.3.16
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:10164
uptime_in_seconds:10701
uptime_in_days:0
connected_clients:1
connected_slaves:1
blocked_clients:0
used_memory:26063394000
used_memory_human:24.27G
changes_since_last_save:79080423
bgsave_in_progress:0
last_save_time:1279930909
bgrewriteaof_in_progress:0
total_connections_received:19
total_commands_processed:216343823
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
db0:keys=200000001,expires=0

Slave:

redis_version:1.3.16
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:5983
uptime_in_seconds:7928
uptime_in_days:0
connected_clients:2
connected_slaves:0
blocked_clients:0
used_memory:26063393872
used_memory_human:24.27G
changes_since_last_save:78688774
bgsave_in_progress:0
last_save_time:1279930921
bgrewriteaof_in_progress:0
total_connections_received:11
total_commands_processed:214343823
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:slave
master_host:host1
master_port:63790
master_link_status:up
master_last_io_seconds_ago:512
db0:keys=200000001,expires=0

This tells me that on a 32GB box, it’s not unreasonable to host 200,000,000 keys (if their values are sufficiently small). Since I was hoping for 100,000,000 with likely lager values, I think this looks very promising. With a 10 machine cluster, that easily gives us 1,000,000,000 keys.

In case you’re wondering, the redis.conf on both machines looked like this.

daemonize yes
pidfile /var/run/redis-0.pid
port 63790
timeout 300
save 900 10000
save 300 1000
dbfilename dump-0.rdb
dir /u/redis/data/
loglevel notice
logfile /u/redis/log/redis-0.log
databases 64
glueoutputbuf yes

The resulting dump file (dump-0.rdb) was 1.8GB in size.
I’m looking forward to the official 2.0.0 release. :-)

→ 8 CommentsCategories: programming · tech

Coding Outside My Comfort Zone: Front-End Hacking with jQuery and flot

July 15, 2010 · 4 Comments

To folks who’ve read my tech ramblings over the years, it’s probably no surprise that I generally avoid doing front-end development (HTML, CSS, JavaScript) like the plague. In fact, that’s probably one of the reasons I finally migrated my blog from a self-hacked and highly-tweaked MovableType install to WordPress. I spend the majority of my time dealing with back-end stuff: MySQL, Sphinx, Redis, and the occasional custom data store (for a feature we’re launching soon). I try to build and maintain fast, stable, and reliable services upon which people who actually have front-end talent can build usable and useful stuff.

But every now and then I have an itch to scratch.

For the last year and a half, I’ve wanted to “fix” a piece of our internal monitoring system at craigslist. We have a home-grown system for gathering metrics across all our systems every minute as well as storing, alerting, and reporting on that data. One piece of that is a plotting tool that has a web interface which lets you choose a metric (like CpuUser or LoadAverage), time frame, and hosts. When you click the magic button, it sends those selections to a server that pulls the data, feeds it to gnuplot, and then you get to see the chart. It’s basic but useful.

However, I wanted a tool that gave me more control, took advantage of the fact that I have a lot of CPU power and RAM right here on my computer, and make prettier charts. I wanted easier selection of hosts and metrics (with auto-complete as you type instead of really big drop-down lists), plotting of multiple metrics per chart, and a bunch of other stuff. So I went back to a few bookmarks I’d collected over the last year or two and set about building it.

I ended up using JSON::XS and building a mod_perl handler to serve as a JSON endpoint that could serve up lists of hosts and metrics (for the auto-completion) from MySQL as well as the time series data for the plots. That was the easy part. For the font-end I used jQuery (the wildly popular JavaScript toolkit) and flot (a simple but flexible and powerful charting library based on jquery). It took a lot of prototyping and messing around to get the JavaScript bits right. That is due largely to my lack of knowledge and experience. It’s frustrating to me when nothing happens and I wonder what good debugging tools might be. But instead of actually bothering to install and learn something like Firebug, I just charge ahead and try to reason out WTF is going on. Eventually I get somewhere.

As with most things, the initial learning curve was steep. But I eventually started to feel a little comfortable and productive. I had a few patterns for how to get things done and, most importantly, I understood how they worked. So I was able to piece together a first version with all the minimal functionality I thought would be good to have. Yesterday I made that first version available internally. There’s already a wishlist for future features, but I’m happy with what I have as a starting point.

It was fun to step out of the normal stuff I do. Getting out of my comfort zone to work on front-end stuff gave me a renewed appreciation for all this browser technology (it works on Firefox, Chrome, and Safari), the libraries I built upon, and all the collective work that must have produced them. Aside from scratching my own itch, I now feel a little less resistant to hack on JavaScript. So that means I’m just a bit more likely to dive into something similar in the future.

It feels like I stretched parts of my brain that don’t normally get much of a workout. I like that.

→ 4 CommentsCategories: craigslist · programming · tech

MySQL 5.5.4-m3 in Production

July 14, 2010 · 2 Comments

Back in April I wrote that MySQL 5.5.4 is Very Exciting and couldn’t wait to start running it in production. Now here we are several months later and are using 5.5.4-m3 on all the slaves in what is arguably our most visible (and one of the busiest) user-facing cluster. Along the way we deployed some new hardware (Fusion-IO) but not a complete replacement. Some boxes are Fusion-io, some local RAID, and some SAN.  We have too many eggs for any one basket.

We also converted table to the Barracuda format in InnoDB, dropped an index or two, converted some important columns to BIGINT UNSIGNED and enabled 2:1 compression for the table that has big chunks of text in it. Aside from a few false starts with the Barracuda conversion and compression, things went pretty well. Coming from 5.0 (skipping 5.1 entirely) we had some my.cnf work to do to take advantage of all the new InnoDB tuning (especially on the boxes with Fusion-IO cards and more memory). Hopefully switching the master goes smoothly too.

Needless to say, we have some very good and dedicated folks working behind the scenes to make it all happen (hi, Josh). Most of my involvement was initial testing, prodding, schema changes, and finding my.cnf options for the InnoDB tuning.

→ 2 CommentsCategories: craigslist · mysql · tech

Database Drama

July 10, 2010 · 13 Comments

There’s been a surprising amount of drama (in some circles, at least) about database technology recently.  I shouldn’t be surprised, given the volume of reactions to the I Want a New Datastore post that I wrote. (Hint: I still hear from folks pitching the newest data storage systems.)

The two things that caught my eye recently involve Cassandra and MongoDB (and, indirectly, MySQL). First was what I read as a poorly thought out and whiny critique of MongoDB’s durability model: MongoDB Performance & Durability. Just because something is the default doesn’t mean you have to use it that way. Thankfully there was reasoned discussion and reaction elsewhere, including the Hacker News thread about it.

Look. Building fast, feature-rich, scalable systems is Really Hard Work. You’re always making tradeoffs. You can have the ultimate in single-server durability (with all the fancy hardware that dictates) but you’re going to really sacrifice performance (or budget!). But at least you won’t have a lot of complexity. Or you can build something that scales out really well using many machines. But that adds a lot of complexity and different sacrifices.

Next comes the Twitter Engineering blog post Cassandra at Twitter Today in which we learn that Twitter loves Cassandra but they’re opting to use their sharded MySQL infrastructure for storing tweets. This surprised a lot of people and even became “news” at TechCrunch. This is hardly surprising. The long version of why I say that is captured in the Reddit comments on the story.

But if you’re not interested in reading the 80+ comments currently there, maybe I can simplify it a bit. Have you ever wondered why there are so damned many NoSQL systems out there?

Simple. Different circumstances dictate making different choices when presented with the list of tradeoffs. This includes durability, performance, data model, scalability, richness of query language, replication model, atomicity, indexing, transactions, administration and support, etc.

Each and every one of those NoSQL projects exist because someone needed them. And sometimes you need to start using a shiny new thing before really understanding its limitations and what those tradeoffs REALLY mean in your environment. And once you’ve done that you might realize that sticking with the tried and true is the best path forward. The same is true of programming languages (Ruby vs. Python vs. PHP vs. JavaScript vs. Go vs. whatever) and the frameworks that programmers decide to use. Lots of drama and fan-boy arguments that really boil down to different people having different needs and priorities.

I’m not saying any of this to promote MySQL and knock Cassandra or MongoDB. I lost at least a day of work last week due to some legacy MySQL issues that seem completely insane in the modern world. But years ago those issues were edge cases. Nowadays they’re very easy to hit.

I’ve actually spent some time recently playing with both Cassandra and MongoDB in the hopes of replacing  a big (in data size, not query volume) MySQL cluster. Both are impressive (and frustrating) in different ways. But ultimately, I do expect that one of them will work quite nicely in this role–and possibly others later on. Not having to contemplate another multi-week ALTER TABLE will be a welcome change!

Which one?  Stay tuned. :-)

Maybe what I should do in the meantime is spend more time reading stories about what works WELL for people, instead of how they’re unhappy with their choice of tool. All this drama is a real time sink.

→ 13 CommentsCategories: mysql · nosql · tech

Testing Redis 2.0.0 Release Canidate with Perl

July 2, 2010 · 1 Comment

I’m pretty excited about the upcoming 2.0.0 release of Redis. As you can see in the changelog, I made a few minor contributions to this release. I’m most excited about being able to perform unions and intersections with sorted sets (ZSETs) and the new HASH data structure, not to mention the memory footprint reduction. Once it tests out well, I’m hoping to upgrade our Redis clusters at Craigslist to the 2.0 code base.

To do that I needed to upgrade our Perl client. The current client on CPAN doesn’t know about the protocol changes to allow for binary-safe key names in all commands. But the multi-bulk branch on github has that code. So I merged it into my fork (creating a multi-bulk branch of my own) and adjusted the tests so that all tests pass when running make test. It’s just a few minor fixes but I feel better about not seeing failed tests.

Yay for Github and Open Source.

Now it’s time to build some new RPMs, install, and test our in our development environment.

→ 1 CommentCategories: craigslist · nosql · programming

Some Recipes I’d Like To Try This Summer

June 11, 2010 · 1 Comment

One of the great things about learning to cook is the plethora of good recipe web sites you can find. And while the recipes aren’t necessarily as good or in-depth as some cookbooks, they do have the benefits of user ratings, comments, and suggestions. Some of the better ones even have RSS feeds, so they’re easy to follow in Google Reader.

Here are a few I’m hoping to try out this Summer:

Added to that is the fact that Kathleen called me a few days ago with the names of four new cookbooks to add to the Amazon wishlist. I guess there will be no reason not to use all our kitchen toys this season!

If I can remember to, I’ll try to come back to this post and update it with a few other recipe links I’ve come across and would love to try.

Oh, that reminds me… I should post pictures of our “garden” soon. We have lots of peppers, tomato plants, herbs, and more.

→ 1 CommentCategories: cooking

The Dumb and The Smart of the Internet Age

June 5, 2010 · 1 Comment

In two competing but somewhat orthogonal essays in The Wall Street  Journal, Clay Shirky and Nicholas Carr ostensibly argue opposite sides of the Internet’s effects on society. In Does the Internet Make You Smarter? Shirky tries to put the Internet in the larger historical context of the commoditization of media creation and dissemination tools. Carr, in Does The Internet Make You Dumber? focuses on the effects of the ever-present on-line distractions on our ability to read, comprehend, focus, and think deeply about issues.

Both are definitely worth a read. I’ve long admired the writings of Clay and Nick, even though I don’t always agree with them. But in the end, I find that I agree with both of them. They’re both right. As time goes on, we’ll certainly look back over the last 5-10 years as a turning point in self-publishing and self-expression in the (on-line) public sphere. It’s the latest point along the ever growing arc of societal changes brought about by new technology. Shirky paints that picture well.

But Carr makes some very good points as well. We’re still learning how to deal with this new medium and the perils of mindlessly letting it control (or at least over-influence) our thinking and behavior. I still remember when people referred to television as the “boob tube” and talked of it rotting our brains. They, too, were right to be concerned. But ultimately we, the individuals, are still in control. It is our responsibility to recognize that and assert that control.

If you haven’t read their essays, do yourself a favor. Both are thought provoking. If nothing else, see if you agree with one or both of them… or neither!

→ 1 CommentCategories: tech

MongoDB Early Impressions

May 22, 2010 · 5 Comments

I’ve been doing some prototyping work to see how suitable MongoDB is for replacing a small (in number, not size) cluster of MySQL servers. The motivation for looking at MongoDB in this role is that we need a flexible and reliable document store that can handle sharding, a small but predictable write volume (1.5 – 2.0 million new documents daily), light indexing, and map/reduce operations for heavier batch queries. Queries to fetch individual documents aren’t that common–let’s say 100/sec in aggregate at peak times.

What I’ve done so far is to create a set of Perl libraries that abstract away the data I need to store and provide a “backend” interface to which I can plug in a number of modules for talking to different data stores (including some “dummy” ones for testing and debugging). This has helped to clarify some assumptions and change a few of my early design ideas. As you may remember from I Want a New Data Store, I’m interested in trying a few options.

Then I created another set of modules that can read the existing data from the MySQL tables, joining as necessary, and normalizing some field names as well as removing “empty” fields. (I’ve decided that it’s silly to store “NULL” values in MongoDB, since we still have to store the key even when there’s no value.) That code has allowed me to do some performance testing and, more importantly, sizing tests. I know have a pretty reliable estimate of our storage needs, assuming the data I’ve pulled so far is representative of the future. It looks something like this:

 1GB = ~       300,000 items  (0.3M)
 10GB = ~     3,000,000 items    (3M)
100GB = ~    30,000,000 items   (30M)
  1TB = ~   300,000,000 items  (300M)
 10TB = ~ 3,000,000,000 items    (3B)

This gives me a good way to judge how much space and how many machines we’ll end up needing.

Performance has be very good so far. The bottleneck is clearly getting the data out of MySQL. Because of the scattered nature of the content in the existing system, and the number of queries and joins required, it’s possible to pull at most 50-100 documents per second. Given that we have nearly a billion documents to migrate, it’ll take some time (and a few TB of space). I definitely plan to distribute that work and take advantage of there being multiple copies of the data around.

Programming against MongoDB is not hard at all. The MongoDB Perl modules are great. It’s really just a matter of wrapping your head around the API and getting used to the JavaScript console instead of the MySQL shell. The server’s http interface is handy for a quick idea of what’s going on, though we’ve hardly stressed this instance so far.

Limitations

There are two limitations (or missing feautres) that I’ve encountered so far in MongoDB. The first is a lack of compression. For this particular data set, I’m reasonably confident that both gzip and lzo could easily get somewhere from 3:1 or 5:1 compression. That could mean a substantial space savings (several TB of disk) at a very small CPU cost (and modern CPUs are very fast). The state of compressed file systems in Linux is sad, so the only real hope in the short or medium term is likely MongoDB object compression as requested in SERVER-164, which I have voted for and commented upon.

The second issue I’ve encountered is the 4MB object size limit. SERVER-431 discusses this a bit and I’d love to see some more discussion and voting there. My motivation for this is to allow for sequential reads of larger objects (think of them as something like covering indexes from the relational world) so that they don’t need to be chunked up (think GridFS) and force me to incur disk seeks when retrieving them. Amusingly, I was hoping to store lists (or arrays) in some documents using the $push operator when I ran into this. It didn’t take long to also find SERVER-654 and realize that I wasn’t the only one.

My ultimate wish for the 4MB limit would be a configurable limit so that the DBA or programmer can choose what’s right for their dataset. In my case, I went back and did some more analysis to see how many records would be affected. It turns out that it was a very small percentage of the total data. But the outliers do exist and since there’s a limit, I’d have to work around it anyway. That’s still code I need to write, test, and support. This caused me to think about the problem a little differently and consider other solutions too.

What’s Next

I need to do a bit more work on the storage side and some more testing. The next major piece of this system is an indexing piece that may not end up living inside of MongoDB. I’ll write more about that later, but my current thinking is to use Sphinx (which we already know and love).  This may give me an excuse to play with two new Sphinx features I’ve been itching to try: string attributes and real-time indexing (via the MySQL protocol).

The next MongoDB piece I want to play with is sharding and replication. I’m hoping to get a feel for the strengths and weaknesses, as well as figuring out the right deployment strategy for us if we end up using MongoDB. Thankfully, our new development environment just got setup last week and there’s enough hardware that I can experiment with different topologies to see how it’ll work out.

I’ll write more about both the Sphinx and MongoDB experimentation as it progresses.

→ 5 CommentsCategories: craigslist · mysql · nosql · programming · tech

Ubuntu 10.4 Impressions

May 21, 2010 · 6 Comments

I’ve upgraded two desktop computers to the 64bit release of Ubuntu 10.4 in the last week. One was a fresh install (my desktop at the office, a Dell Optiplex 755) and the other was an in-place upgrade (my desktop at home, a Dell Optiplex 760 with dual monitors). Aside from the new color scheme (yuck), I’m pretty happy with 10.4. So far nothing has broken, crashed, or regressed in any way that I’ve noticed. I was able to easily drop the latest versions of VirtualBox and Google Chrome (the only 3rd party packages I really use) without issue. The most noticeable change so far has to be the boot speed. Both machines go from hitting the Enter key at the Grub menu to a usable desktop in a remarkably short amount of time.

The irony is that since these are Linux desktops, I really don’t reboot them that often. But this means that when I have the confidence to upgrade my laptop, which I do reboot more often, things will definitely feel better. And that also applies to any 10.4 virtual machines I run too. The most interesting question is whether I can get away with all 64bit installs. This is 2010, after all, so I’d like to make use of all this memory. But the VPN client at work has been the limiting factor so far. Thankfully there appears to be a solution on the horizon for that.

I’ve been running the Ubuntu Netbook Remix on my Samsung NC10, so I’m curious to try the Ubuntu 10.4 Netbook Edition on it soon as well. Anyone tried it?

→ 6 CommentsCategories: tech