Following up on yesterday’s 200,000,000 Keys in Redis 2.0.0-rc3 post, which was a worst-case test scenario to see what the overhead for top-level keys in Redis is, I decided to push the boundaries in a different way. I wanted to use the new Hash data type to see if I could store over 1 billion values on a single 32GB box. To do that, I modified my previous script to create 25,000,000 top-level hashes, each of which had 50 key/value pairs in it.
The code for redisStressHash was this:
#!/usr/bin/perl -w $|++; use strict; use lib 'perl-Redis/lib'; use Redis; my $r = Redis->new(server => 'localhost:63790') or die "$!"; ## 2.5B values for my $key (1..25_000_000) { my @vals; for my $k (1..50) { my $v = int(rand($key)); push @vals, $k, $v; } $r->hmset("$key", @vals) or die "$!"; } exit; __END__
Note that I added a use lib in there to use a modified Redis Perl library that speaks the multi-bulk protocol used all over in the Redis 2.0 series.
If you do the math, that yields 1.25 billion (1,250,000,000) key/value pairs stored. This time I remembered to time the execution as well:
real 160m17.479s user 58m55.577s sys 5m53.178s
So it took about 2 hours and 40 minutes to complete. The resulting dump file (.rdb file) was 13GB in size (compared to the previous 1.8GB) and the memory usage was roughly 17GB.
Here’s the INFO output again on the master:
redis_version:1.3.16 redis_git_sha1:00000000 redis_git_dirty:0 arch_bits:64 multiplexing_api:epoll process_id:21426 uptime_in_seconds:12807 uptime_in_days:0 connected_clients:1 connected_slaves:1 blocked_clients:0 used_memory:18345759448 used_memory_human:17.09G changes_since_last_save:774247 bgsave_in_progress:1 last_save_time:1280092860 bgrewriteaof_in_progress:0 total_connections_received:22 total_commands_processed:32937310 expired_keys:0 hash_max_zipmap_entries:64 hash_max_zipmap_value:512 pubsub_channels:0 pubsub_patterns:0 vm_enabled:0 role:master db0:keys=25000000,expires=0
Not bad, really. This provides a slightly more reasonable usse case of storing many values in Redis. In most applications, I supsect people will have a number of “complex” values stored behind their top-level keys (unlike my previous simple test).
I’m kind of tempted to re-run this test using LISTS, then SETS, then SORTED SETS just to see how they all compare from a storage point of view.
In any case, a 10 machine cluster could handle 12 billion key/value pairs this way. Food for thought.
Pingback: 24/7 Wrinkle Serum News
I wonder how much time it takes a slave to sync all that data to itself (assuming it starts when the master is already fully populated).
Oh, that wouldn’t be hard to find out at all. Lemme go try that. 🙂
Pingback: 1,250,000,000 Key/Value Pairs in Redis 2.0.0-rc3 on a 32GB Machine « Jeremy Zawodny’s blog « blackdog
What about access time after storing billions of values? Most of the time access is concurrent.. I am not sure if Redis is using shared read lock or exclusive locks. In any case, there will be a locking overhead while accessing the keys.. was just wondering…
There is no locking. Redis is a single process, single threaded, event-driven server.
Pingback: 用Redis存储大量数据 : NoSQLfan
In light of the above, I’d be interested to hear your take on this question:
http://serverfault.com/questions/168247/mysql-working-with-192-trillion-records-yes-192-trillion
You actually make it seem so easy with your presentation but I find this topic to be really
something which I think I would never understand. It seems too complicated and extremely broad
for me. I am looking forward for your next post, I will try to get the hang of it!