Sharding is Hard

Reading the Foursqure/MongoDB post-mortem I’m struck by a few things. First, the folks at 10gen are doing a great job of being very open and upfront about what happened and how they hope to solve this problem so that it doesn’t happen again. As someone currently working on a MongoDB deployment myself, I can tell you that a team like that is worth its weight in gold.

Secondly, I’m reminded of the issues that Reddit had a while back with Cassandra. In both cases, servers were arguably under provisioned and running close to limits. And when it came time to add more servers to alleviate the load, the process actually made things worse (because of data migration happening) and didn’t offer immediate relief (because the original servers needed to be compacted).

I suspect that we’ll see this happen more often as time goes on. And there will be high-profile cases. They may or may not be with Cassandra or MongoDB, but the strategies they use are likely also being used by other sharding-friendly NoSQL database engines.

There is no single solution. This is partly a human issue related to planning and monitoring. You really have to stay on top of your growth rates. And it’s also a deficiency in software systems. There may be a need for more sophisticated or aggressive data migration and compaction routines.

All this goes to show that the folks who like to say “just shard your data and you’ll scale forever” are really over-simplifying the problem. The scary part is that I’m not sure many of them realize how much they’re glossing over.

Sharding is hard.

About Jeremy Zawodny

I'm a software engineer and pilot. I work at craigslist by day, hacking on various bits of back-end software and data systems. As a pilot, I fly Glastar N97BM, Just AirCraft SuperSTOL N119AM, Bonanza N200TE, and high performance gliders in the northern California and Nevada area. I'm also the original author of "High Performance MySQL" published by O'Reilly Media. I still speak at conferences and user groups on occasion.

View all posts by Jeremy Zawodny →

2 Responses to Sharding is Hard

Razi Sharir says:

October 11, 2010 at 7:36 am

I have to admit that I have not yet seen a “sharding-friendly” technique or methodology and will love to get pointers to one.
Touching the application layer each time a change is required might not qualified for the term “friendly”. It could be related to as tedious, not inclusive or even as risky, let alone effective or efficient…
Sharding is a great way to deal with inherent traditional SQL DB inability to scale out. Having said that, it will provide for scaling this particular mid tier and address the symptom, it will not address the bottom underlying problem.
Using NoSQL is a great solution provided you know what you’re getting in return and the limitations imposed.

Razi Sharir (blog.xeround.com)

Ahmedkhan says:

April 8, 2016 at 8:55 pm

You’re facing some problems to sharing the data using Data base. Here you wrote the whole problem to get the possible solution. I think Inherent Traditional SQL DB is very useful for that purpose. just need to know the use of SQL. Well! I want to buy techno viking sticker but as a student of SQL, such articles are helpful for me.

Sharding is Hard

About Jeremy Zawodny

2 Responses to Sharding is Hard

Leave a comment Cancel reply

recent tweets

blog archive

RSS

Sharding is Hard

Share this:

Related

About Jeremy Zawodny

2 Responses to Sharding is Hard

Leave a comment Cancel reply

recent tweets

blog archive