200,000,000 Keys in Redis 2.0.0-rc3

I’ve been testing Redis 2.0.0-rc3 in the hopes of upgrading our clusters very soon. I really want to take advantage of hashes and various tweaks and enhancements that are in the 2.0 tree. I was also curious about the per-key memory overhead and wanted to get a sense of how many keys we’d be able to store in our ten machine cluster. I assumed (well, hoped) that we’d be able to handle 1 billion keys, so I decided to put it to the test.

I installed redis-2.0.0-rc3 (reported as the 1.3.16 development version) on two hosts: host1 (master) and host2 (slave).

Then I ran two instances of a simple Perl script on host1:

#!/usr/bin/perl -w
$|++;

use strict;
use Redis;

my $r = Redis->new(server => 'localhost:63790') or die "$!";

for my $key (1..100_000_000) {
	my $val = int(rand($key));
	$r->set("$$:$key", $val) or die "$!";
}

exit;

__END__

Basically that creates 100,000,000 keys with randomly chosen integer values. They keys are “$pid:$num” where $pid is the process id (so I could run multiple copies). In Perl the variable $$ is the process id. Before running the script, I created a “foo” key with the value “bar” to check that replication was working. Once everything looked good, I fired up two copies of the script and watched.

I didn’t time the execution, but I’m pretty sure I took a bit longer than 1 hour–definitely less than 2 hours. The final memory usage on both hosts was right about 24GB.

Here’s the output of INFO from both:

Master:

redis_version:1.3.16
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:10164
uptime_in_seconds:10701
uptime_in_days:0
connected_clients:1
connected_slaves:1
blocked_clients:0
used_memory:26063394000
used_memory_human:24.27G
changes_since_last_save:79080423
bgsave_in_progress:0
last_save_time:1279930909
bgrewriteaof_in_progress:0
total_connections_received:19
total_commands_processed:216343823
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
db0:keys=200000001,expires=0

Slave:

redis_version:1.3.16
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:5983
uptime_in_seconds:7928
uptime_in_days:0
connected_clients:2
connected_slaves:0
blocked_clients:0
used_memory:26063393872
used_memory_human:24.27G
changes_since_last_save:78688774
bgsave_in_progress:0
last_save_time:1279930921
bgrewriteaof_in_progress:0
total_connections_received:11
total_commands_processed:214343823
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:slave
master_host:host1
master_port:63790
master_link_status:up
master_last_io_seconds_ago:512
db0:keys=200000001,expires=0

This tells me that on a 32GB box, it’s not unreasonable to host 200,000,000 keys (if their values are sufficiently small). Since I was hoping for 100,000,000 with likely lager values, I think this looks very promising. With a 10 machine cluster, that easily gives us 1,000,000,000 keys.

In case you’re wondering, the redis.conf on both machines looked like this.

daemonize yes
pidfile /var/run/redis-0.pid
port 63790
timeout 300
save 900 10000
save 300 1000
dbfilename dump-0.rdb
dir /u/redis/data/
loglevel notice
logfile /u/redis/log/redis-0.log
databases 64
glueoutputbuf yes

The resulting dump file (dump-0.rdb) was 1.8GB in size.
I’m looking forward to the official 2.0.0 release. :-)

About Jeremy Zawodny

I'm a software engineer and pilot. I work at craigslist by day, hacking on various bits of back-end software and data systems. As a pilot, I fly a Flight Design CTSW and high performance gliders in the northern California and Nevada area. I'm also the original author of "High Performance MySQL" published by O'Reilly Media. I still speak at conferences and user groups on occasion.
This entry was posted in programming, tech. Bookmark the permalink.

11 Responses to 200,000,000 Keys in Redis 2.0.0-rc3

  1. Sean Porter says:

    Thanks!

    Great post, wish you had recorded the run time :)

    Have you looked into using virtual memory?

    I’m curious to see the performance impact when values are pulled from disk, back into memory. Having rarely used key values stored on disk seems more cost effective, specially reaching the ONE BILLION DO… keys (with more realistic/larger values).

    Would you be up for another go?

    Thanks again!

  2. Sean,

    VM is really only appropriate for situations where you have fairly large values behind your keys. I suspect that I could make it “work” but I’d get maybe 500,000,000 on a single node at best (since the keys still need to be in RAM).

    I suppose it’d be fun to try on one of our Fusion-io equipped machines! :-)

  3. Sandeep says:

    How large a value are you realistically looking to store ?
    It will be very interesting to see how it works when you store, say 10K of XML per key.

    You see because what you are doing is fitting the value in one “machine word-size” – Only in larger text will, realistic performance be apparent.

  4. Pingback: 1,250,000,000 Key/Value Pairs in Redis 2.0.0-rc3 on a 32GB Machine « Jeremy Zawodny's blog

  5. Pingback: Top Posts — WordPress.com

  6. Pedro Lopes says:

    A little off topic – what is ‘$|++’ (2nd line of your script) ? Hard to find an explanation on Google.

    • jenya says:

      if you set $| to true ($| = 1 or simply $|++) it makes filehandle output unbuffered (flushes buffer immediately)

  7. Ryan Detzel says:

    Hey, can I ask what you’re using this for? We’re researching doing something similar and we probably need to store 300-500M keys so we’re trying to decide if we should just use one nice machines or a few decent machines in a cluster setup using some type of internal sharding to know where to fetch the data from.

  8. Salman says:

    What about read/write performance? Was it linear when you have that many keys in redis?

  9. Pingback: 用Redis存储大量数据 : NoSQLfan

  10. Nick says:

    You can use sharding with keys (php example) –
    $servers = array(“192.168.0.1″, “192.168.0.2”);
    $host = crc32(md5($key)) % count($servers);
    then connect and store / read the key from this host.
    however in this set-up, you will not be able to do set operations between the servers
    (e.g. intersect / union sets that are located in different servers)

    i tried client intersects – 1M+ sets on 2 diff servers and wait time is huge – 3-4 sec for long sets,
    since all data needs to be send to the client.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s