Database Drama

There’s been a surprising amount of drama (in some circles, at least) about database technology recently.  I shouldn’t be surprised, given the volume of reactions to the I Want a New Datastore post that I wrote. (Hint: I still hear from folks pitching the newest data storage systems.)

The two things that caught my eye recently involve Cassandra and MongoDB (and, indirectly, MySQL). First was what I read as a poorly thought out and whiny critique of MongoDB’s durability model: MongoDB Performance & Durability. Just because something is the default doesn’t mean you have to use it that way. Thankfully there was reasoned discussion and reaction elsewhere, including the Hacker News thread about it.

Look. Building fast, feature-rich, scalable systems is Really Hard Work. You’re always making tradeoffs. You can have the ultimate in single-server durability (with all the fancy hardware that dictates) but you’re going to really sacrifice performance (or budget!). But at least you won’t have a lot of complexity. Or you can build something that scales out really well using many machines. But that adds a lot of complexity and different sacrifices.

Next comes the Twitter Engineering blog post Cassandra at Twitter Today in which we learn that Twitter loves Cassandra but they’re opting to use their sharded MySQL infrastructure for storing tweets. This surprised a lot of people and even became “news” at TechCrunch. This is hardly surprising. The long version of why I say that is captured in the Reddit comments on the story.

But if you’re not interested in reading the 80+ comments currently there, maybe I can simplify it a bit. Have you ever wondered why there are so damned many NoSQL systems out there?

Simple. Different circumstances dictate making different choices when presented with the list of tradeoffs. This includes durability, performance, data model, scalability, richness of query language, replication model, atomicity, indexing, transactions, administration and support, etc.

Each and every one of those NoSQL projects exist because someone needed them. And sometimes you need to start using a shiny new thing before really understanding its limitations and what those tradeoffs REALLY mean in your environment. And once you’ve done that you might realize that sticking with the tried and true is the best path forward. The same is true of programming languages (Ruby vs. Python vs. PHP vs. JavaScript vs. Go vs. whatever) and the frameworks that programmers decide to use. Lots of drama and fan-boy arguments that really boil down to different people having different needs and priorities.

I’m not saying any of this to promote MySQL and knock Cassandra or MongoDB. I lost at least a day of work last week due to some legacy MySQL issues that seem completely insane in the modern world. But years ago those issues were edge cases. Nowadays they’re very easy to hit.

I’ve actually spent some time recently playing with both Cassandra and MongoDB in the hopes of replacing  a big (in data size, not query volume) MySQL cluster. Both are impressive (and frustrating) in different ways. But ultimately, I do expect that one of them will work quite nicely in this role–and possibly others later on. Not having to contemplate another multi-week ALTER TABLE will be a welcome change!

Which one?  Stay tuned. :-)

Maybe what I should do in the meantime is spend more time reading stories about what works WELL for people, instead of how they’re unhappy with their choice of tool. All this drama is a real time sink.

About Jeremy Zawodny

I'm a software engineer and pilot. I work at craigslist by day, hacking on various bits of back-end software and data systems. As a pilot, I fly Glastar N97BM and high performance gliders in the northern California and Nevada area. I'm also the original author of "High Performance MySQL" published by O'Reilly Media. I still speak at conferences and user groups on occasion.
This entry was posted in mysql, nosql, tech. Bookmark the permalink.

15 Responses to Database Drama

  1. Alex Popescu says:

    Nice post and I mostly agree with all your points. A couple of comments though:

    - the post about MongoDB durability seems to have come out of PR frustrations: “MongoDB is so much faster than CouchDB”. I’ve covered the MongoDB durability vs speed tradeoff months ago and while getting less attention (probably the tone?) it generated a healthy discussion and (hopefully) awareness of these sort of tradeoffs

    - re: Cassandra at Twitter what may come out as a surprise is the amount of time and work they put in to make it work for tweets. While I’ve been the first to write about it months ago, according to Ryan, Twitter was testing Cassandra for quite some time. Fortunately for them (and I’m saying this more in terms of effort and time spent), they found ways to use it

    :- alex

    ps: as you mention reading about stories where things are working, I would recommend reading the myNoSQL blog (http://nosql.mypopescu.com) which focuses on usecases and story cases (without throwing away looking into things that might not work though).

  2. Nice to see some sanity and reasoned thought in NoSQL land.

    I actually find it rather ironic:

    On one hand, NoSQL advocates will claim “one size does not fit all” re: RDBMS technology. The argument is that some data is better stored elsewhere, because some classes of data and application-data interaction patterns are better served by systems that make different tradeoffs.

    But frequently it doesn’t seem to apply to their particular pet project.

    One size does not fit all, for all the reasons you outline.

    As much as I would *like* to be back to wearing a size 32 pant, it ain’t happening any time soon.

  3. nzkoz says:

    “Each and every one of those NoSQL projects exist because someone needed them. ”

    That’s not entirely true. Some of those NoSQL projects exist because a company raised some VC financing to build a NoSQL data store. In those cases things like over-hype or misleading benchmarks are even more annoying than they’d normally be.

    As a rails guy I can hardly complain about accentuating the positive and playing to your strengths, but “loses data in the default configuration in order to win performance benchmarks” is on the other side of the line.

  4. mrg says:

    agreed on every point, probably same as many others. With that, are you ready to create/curate numbers/stats/benchmarks? Maybe your army of readers can help…

  5. Sammy says:

    I keep hearing about the “MongoDB evil marketing” but I have never seen a single benchmark or speed claim made by MongoDB or 10gen.

    Can anyone post a single link to one or is this just total fud spread by competitors.

  6. RDRush says:

    Controversy is often rooted in a lack of foresight regarding anything of public disclosure and consumption.

    Specialization + laziness + ego = argument where the the argument boils down to “just because” style statements. “Just because” statements play a role as place holders in the abstract where facts should arise — this is where truth declines into opinion.

    Generalization + optimization = foundation is a three step principle that will enable any system with proven framework and workflow guidelines.

    Small “highly specialized” databases are intended for the purpose of a specialized and optimized application “in context” and really can’t be compared to a general workflow.

    Something like Twitter or Facebook switching to other databases/dbms is obviously going to be persuaded by deep support. DB support will have to entertain images, text, linking and numerical operations for date-time-group management; etc.

    The debate is in fact strongly founded on need at a given point in time and this same issue has been evident in industrial manufacturing forever. The bigger better deal and the your’s sucks worse than mine has been around for a long time.

    Specialized databases are always good until you outgrow them and it will happen.

    There are two concerns that are always overlooked that will save you an incredible amount of time and eliminate migraines. The first is how extensible is your database and how much does it already cover for you, and second is — what is available for migrating your data from one dbms to another and can it import a dump. These two major points of will in fact save your bacon if you opt to lend them any kind of serious thought.

    Decent article — stirred my mind a bit. Didn’t realize I was getting rusty.

  7. Alex Popescu says:

    @Sammy: You might have missed Kristina Chodorow’s posts (10gen) or their presentation on 8mil transactions ;-). While I’m not gonna say they are doing some evil marketing, I do think they encourage a constant flow of such posts.

  8. Pingback: NoSQL Drama Revisited. › PHP App Engine

  9. creek23 says:

    MariaDB all the way!!! — see http://mariadb.org/

  10. @Alex Popescu & @Sammy: I guess I’m the evil 10gen PR machine. In my defense:

    I did post a benchmark on my blog over a year ago. It’s pretty old, silly, and meaningless, which I freely admit and have commented.

    On my MongoSF review (the 8 million thing): I was expecting maybe 100 pageviews from my regular readers (who are mostly fellow MongoDB enthusiasts). I didn’t write it for a general audience: my blog gets ~10 views/day. Scaling problem :(

    I don’t want to piss people off or have them think I’m trying to manipulate them, I’m just think this stuff is cool and interesting and I like writing about it.

  11. Alex Popescu says:

    @Kristina: I think we’ve covered this before already :-).

    As long as some information is public, it is up for debate. The fact that it’s not accompanied by either more details (if you do remember I have even pointed out what info is missing) or at least a disclaimer makes it even more disputable.

    Last, but not least, I’ve never called you PR. It is up to you (and your company) to decide what information goes out from you, what you back or not back, and how transparent you are.

    As you, I am passionate about this field and I like to have access to the most unbiased data. And sometimes it is OK to agree to disagree :-)

  12. Sammy says:

    @alex so – if you follow your logic, mikeal’s post should count as coming from couch.io? their behavior (including damien katz’s) is pretty damn awful. See damien’s twitter stream for example.

    If you claim that simple benchmarks on kristina’s personal blog are 10gen, then you have to say that mikael’s views are from couch.io

    Needless to say I don’t consider kristina’s post or mikeal’s posts as coming from their companies. But if your’e going to – then you have to apply the criteria to both.

    Also – having gone to look at the sharding post, that really doesn’t seem like a bench mark as just a cool demo. If doing cool demos is going to get you into trouble in this new world, count me out.

    BTW – was just going to see if mikeal responded to my last comment on his blog post only to find all the comments were down… that’s a great way to make an impression… if you don’t like the comments, take them down.

    if its a tech thing – then at least say you took them down. or maybe couchdb isn’t as durable as they claim… hard to know since they just removed my comments without a single word of warning…

  13. Alex Popescu says:

    @Sammy: I totally agree with you (I don’t think I’ve said or written anywhere that I think Mikeal’s opinion does or does not reflect couch.io perspective).

    For me the rule is quite simple: if you are an employee of company X and there are no clear disclaimers, then the are good chances that your opinions will reflect (even if partially) those of your employer.

  14. GauravCS says:

    Jeremy – have you already considered and eliminated HBase before narrowing the candidates down to Cassandra and MongoDB? Maybe, I missed an earlier post about it but I am curious to know why HBase is not on your list.

  15. domain says:

    I like what you guys tend to be up too. This type of clever work and reporting!
    Keep up the excellent works guys I’ve incorporated you guys to my personal blogroll.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s