NoSQL is What?

I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You will have plenty of time to switch to NoSQL as and if it becomes helpful.  Until that time, NoSQL is an expensive distraction you don’t need.

Uhm… WHAT?!

I’ve spent more than a few years using MySQL and have been using some NoSQL systems for the last year or so in a fairly busy environment. And scaling is only one of the considerations that factor into those decisions. Features matter too, you know. I really like MongoDB‘s built-in sharding and replica sets. They kick ass. And Redis is an awesome in-memory data store that goes beyond what something like memcached offers. And being schema-less makes a whole hell of a lot of sense in some applications–probably A LOT of applications.

NoSQL exists for a reason–because they ARE useful to a lot of people. This isn’t some stupid bubble.

And to make switching data stores sound like something that “you will have plenty of time for” is outright nuts. There’s a lot of work involved. More than you probably expect. (Ask me how I know…)

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology.  This creates a greater tax on being able to adopt the technology.  That sounds a lot like what we expect to see in premature optimizations to me.

Gee, let me get this straight. If you’re using newer technology, you’re dealing with less mature tools?

No shit. But that’s how progress works. You make a choice to use something that in inferior today because it gives you more leverage in the future. That’s the path that Clayton Christensen laid out in The Innovator’s Dilemma.

There is no particular advantage to NoSQL until you reach scales that require it.

Bullshit. Have you even tried modeling an application that felt shoe horned into MySQL in a NoSQL tool? Is “saving a lot of development time” not a particular advantage? What about time consuming schema changes?

Again, I think we need to talk about the best tool for the job, not the best tool for every job. Relational databases are not the best tool for every data storage job.

If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

Seriously? I guess that has a to do with how you value your time. The term that comes to mind here is opportunity cost.

You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier.

They are more proven, but cheaper and easier have a lot to do with your application and your real needs. This strikes me as an over-reaching generalization that doesn’t match reality.

About Jeremy Zawodny

I'm a software engineer and pilot. I work at craigslist by day, hacking on various bits of back-end software and data systems. As a pilot, I fly Glastar N97BM, Just AirCraft SuperSTOL N119AM, Bonanza N200TE, and high performance gliders in the northern California and Nevada area. I'm also the original author of "High Performance MySQL" published by O'Reilly Media. I still speak at conferences and user groups on occasion.
This entry was posted in mongodb, mysql, nosql, programming, redis. Bookmark the permalink.

70 Responses to NoSQL is What?

  1. davidyu_ftw says:

    Although I agree with your posts, you still need to calm down though 🙂

  2. AngerManagement says:

    X: NoSQL does not give any advantage
    You: Bullshit. NoSQL has great advantages
    X: NoSQL is a considerable investment
    You: Haha, have you tried NoSQL is your life ever? I have spent ages working in this shit, I know
    X: NoSQL is usesless
    You: NoSQL exists for a reason–because they ARE useful to a lot of people.

    Don’t respond with anger. Say something useful.

    • Sorry. I thought I made the point that NoSQL solutions could save development time because they’re better suited to some problems. I also mentioned specific aspects of MongoDB and Redis that I particularly like.

      What you see as anger was intended to be astonishment.

  3. Jay says:

    Yawn, don’t cry that NoSQL isn’t getting the constant adulation it used to. A lot (most) applications do NOT need (and shouldn’t have) that flexibility (and the inconsistency and duplication that brings). And NoSQL does exist for a reason: MySQL being such a horrible botch of a database system that it ruined people’s impressions of relational database systems, and application developer’s inability to understand relational systems as opposed to object systems or that really ‘difficult’ SQL language.

    • Adulation? Hardly.

      I’m arguing that the gross generalizations don’t actually apply and you seem to be missing that point entirely and responding using the same language I used to make your point.

      I’ll grant you that some people aren’t smart enough to figure out MySQL, but I really don’t think that’s the motivation behind many of the NoSQL systems, do you?

  4. gigi19gi says:

    The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.

    I guess you are not an expert because you do it from the start, need to grow up 🙂

    • Errr says:

      gigi19gi: Your comment does not make sense. NoSQL isn’t just something that you replace your SQL database with when it reaches a certain scale. If that was the only case where NoSQL was useful then it would be just an optimization, but in a lot of cases it’s easier, cheaper and more efficient to implement it as an NoSQL database right away.

      You’re hardly an expert if you’re going to devote more time, effort and money to fit something into an SQL database when all that can be achieved much easier using a NoSQL db..

    • Do it from the start?

      I’m guessing you’ve not read much of what I’ve written in the past, nor have you seen any of the presentations I’ve given. A lot of my experience has come from fixing less than optimal setups, not that fact that I “do it from the start.”

      But you used a smiley face, so I guess it’s all good, right? 🙂

      • Narco boy says:

        anger problems, much, jeremy? You come across like a douche bag. Wouldn’t hire you if you were the only developer on the planet.

  5. Kevin Marks says:

    Snotty commenters: you do realize that Jeremy wrote the book on optimizing MySQL? http://oreilly.com/catalog/9780596101718/ Listen to him.

  6. Don Park says:

    I wrote pretty the same except all I could manage was three short sentences on HN. I definitely need to get out of 140 character state of mind and get back my blogging mojo.

  7. Ivan says:

    Today I started my test site, a Twitter “clone” using Redis http://www.ivansuchy.com/redis/ Sure, it is subjective but during development I had strong feeling that the way how NoSQL works with data, is much more similar the way, how we store and think about data in our brain, it’s more natural and intuitive, than relational model.

    • 1111 says:

      That could possibly a personal bias. It fits you but no-one else. It might fits layman people who haven’t done serious RDBMS work.

      • SQL says:

        The point is that NoSQL is more inuitive to how people normally think.

        To respond to this saying “well maybe to someone who hasn’t been versed in RDBMS…” you do realise that you’re agreeing, right?

      • Ivan says:

        I’m working more than 15 years with mysql.

  8. I agree with most of the points. And I am talking from the POV of managing a system using BOTH MySQL AND MongoDB (as well as Sphinx and a bunch of other technologies). That we use this mix is not because we are in transition from one to another, no, it’s because we try to pick an appropriate tool for the job at hand. MySQL, or any SQL based or relational system, is great for the flexibility in managing data, You can query for just about anything and preformance is reasonable in most cases. But in some cases you need even more proformance, or tou need to scale wven more (sharding with MySQL is a pain, sharding with Mongo is dead easy).

    The flexibilility of MySQL / SQL comes at a price though: More difficult to scale while sustaining performance and less potential for great performance in the specific case where you need it. MongoDB comes woth A LOT less flexibility and less performance in the general cases, but smashing performance in a few cases, the cases where you really need it.

    Which is not to say that it’s not true that MongoDB is A LOT less mature. Try backing up a sharded MongoDB setup with replicasets and you see what I mean. And then try it using a script and without interutions of any kind… And without using mongodump (which is just too slow when you have lots of data).

    In my mind, the #1 point that Mark make is that you pick the right tool for the right job. And there isn’t such a thing as 1 tool that will handle all your tasks faster, and with more flexibility, than any other tool. Unless you are one of those who believe that all washing detergents gets better results than any other washinig detergent.

    /Karlsson

  9. Casey says:

    Personally, I liked the original article, though I as well disagreed with the general conclusion, but perhaps I just took something else out of it. A lot of what I’ve seen in the past practically screamed that if you didn’t use NoSQL right out of the gate, you were doomed to failure regardless of the site you were creating, which is not true.
    The use of Relational vs. NoSQL still comes down to what you want to do, who’s doing it, and what you’re building it in. A small RoR site for your personal projects will be easy to do using a relational database and the likelihood you’ll outgrow it probably aren’t too likely unless you made the coolest friggin thing on the planet. However, someone with some technical savvy may want to use CouchDB for everything because it’s cool or they want to learn it. Now, if you’re building that has an obvious need for large amounts of data, such as “Yet Another Social Network” (YASN), then you’ll want to look into a scalable solution sooner than later as you have a reasonable expectation that you will grow based on your competitor’s growth.
    I only mention RoR as I have used it and I’ve seen Django used. In both cases, relational databases are supported very well and for a beginner to use a tutorial to build a simple site is quite easy. These are also useful when developer time is of high importance, such as if you’re building a quick prototype site that may be used by 100 people. Doing it quickly can be much more important than making it scalable.

  10. mtkd says:

    This rant is justified.

    The name ‘NoSQL’ does absolutely no justice to the technology.

    MongoDB/Cassandra etc. are entirely new ways of architecting storage and the benefits of being able to use them schemaless and storing objects almost natively can be enormous for some applications.

    The fact the name includes a ref to SQL encourages comparison. These are new tools for new challenges.

    SQL is fantastic for heavy relational work but that overhead is present all the time, most webapps don’t need a lot of relational data at runtime.

    NoSQL allows you to move the complexity in to occasional configuration activities (writes) and rewards you with blisteringly quick reads the rest of the time.

  11. JohnnyL says:

    Too bad, you don’t have any reasoning why ‘NoSQL’ is any better than MySQL. I wonder who is pulling the levels behind this marketing. Perhaps some people who want to get rich quick? Afaik No SQL and SQL aren’t that much different from each other

    “NoSQL allows you to move the complexity in to occasional configuration activities (writes) and rewards you with blisteringly quick reads the rest of the time.” -YOU

    “blisteringly quick?” Compared to what? This Bullshit is just like … come in 90% off everything!
    HAhaha, what a joke.

    • No reasoning?

      Granted, I didn’t provide a big list, but I did mention specific advantages of MongoDB and Redis that I like. There are some pretty basic differences betweent NoSQL and SQL systems–the lack of joins being most notable.

  12. JohnnyL says:

    “Now with 100% more molecules!”

  13. JohnnyL says:

    “Brawndo: It’s got what plants crave… It’s got electrolytes!”

  14. I saw the title of that article and had assumed it was written either a) sarcastically, or b) as a fine bit of trolling. Now actually having read it, I realize he was serious. And I am boggled.

  15. patricks says:

    I have to disagree with Jeremy’s posts here. But first, please change your attitude and show some respect — you seem to have such a closed mind, absolutely convinced that you are in the right. What if you’re wrong? How can you be so sure as to even belittle others? This sort of behavior shames software developers. I couldn’t believe how immature your writing was. That said, I enjoyed your book on MySql optimizations.

    To my point, there is absolutely nothing that NoSql can do that MySql cannot. MySql alone is enough to build a prototype that scales to at least 1,000,000 users and prove that a market exists for a web startup. This is along is a huge and common challenge – demonstrating traction. Once we can do that, then yes we can hire 10 more engineers to rewrite our duct-tape code into something scalable and beautiful. All web developers begin with MySql, and so using NoSql is a foreign technology that is both a risk and time suck.

    • My attitude is that you should choose the right tool for the job. I’m not sure that I really want to change that.

      It’s interesting that you argue that I have a closed mind. Yet I’ve worked with Oracle, MySQL, Access, SQL Server, MongoDB, Redis, Berkeley DB, and a few others over the years. My opinions are shaped by my experiences and hearing those of others–I don’t just make this stuff up.

      To me, it’s closed-minded to argue that everyone should just start with SQL because it’s easy and worry about the real issues later. That’s exactly what I got from reading the original article.

      I’m glad you enjoyed my book.

      • Artem says:

        Can you be more specific about what MySQL can’t do for you if you are dealing with a user base of less than 1M people? The advantages you point out, such as sharding, only matter when the scale is bigger. Want schema-less data? No problem, just put some JSON objects or pickled Python objects into a column.

        I think the point is that NoSQL is so overhyped that there are waaaay too many startups out there starting with it because it’s the hot new thing. And this means they spend a lot of time writing extra code or dealing with less mature technologies. Since we do know that engineers like to deal with the hot sexy stuff more than old boring stuff, I think the idea of the original article is to make them think twice before going the NoSQL way. Do they *REALLY* need it?

    • isao says:

      @patricks — sure, you could build a site using text files and bash scripts that services millions of people. it’s ubiquitous and well-known. not sure why you’d start there though.

      mysql used to be my hammer and all things using dynamic data the same nail (with memcached and apache/proxies for the scaling post-db optimizations). but we have more hammers these days.. and they’re free, easy, solid… and fun.

      examples…

      fast-changing leaderboard, federating webservices, ratelimiting, fraud detection –> redis!

      documents whose schema is evolving… do you really want to keep adding/removing columns, or try to speed up those tortured EAV SQL queries? just look at –> mongodb

      anyway, needless to say I had the same reaction as Jeremy to “NoSQL is a Premature Optimization”. My initial reaction was less charitable. You know, what do you say to folks who insist the moon-landing was faked? 🙂

  16. ivanhoe says:

    Well, I see this trend for some time: at first everyone was sharing how great a new technology was, to the point that it got really boring and unoriginal and no one was really reading those blogposts anymore. Now suddenly everyone is very critical, new is no good, old stuff is better. And people read and share it again. I think it has more to do with psychology and marketing, than engineering or facts.

  17. galtenberg says:

    This was about as ineffective a rejoinder as I’ve seen in a long time.

    I would love to see an effective riposte, because I’m still uncertain about locking into MySQL.

  18. mike says:

    I felt both articles lacked examples or hard evidence. Thus, this could have been a great article detailing counterpoints to an outlandish original article from a person who has not built a diverse base if applications. I don’t compare relational sql with no sql. they serve different, sometimes overlapping, problems. not to mention the architectural decisions in designing for Each data store.

    thanks for taking the time to respond to that original article.

  19. Derek says:

    And this whole time I thought the best part of NoSQL was the minimization of impedance and it’s ease of use.

  20. Pingback: Interesting NoSql argument | Gokul's Blog

  21. Michael S. Fischer says:

    I agree, this came off pretty ranty, Jeremy. 🙂

    I think there’s a fair point to be made that DHTs scale better than relational DBs for very large data sets, so they’re a great fit for non-relational data that doesn’t involve joins. Nevertheless, the original author makes some (implicit) observations that (1) most data is still relational and the pain of trying to shoehorn that into a DHT is nontrivial; and (2) most users aren’t and never will be large enough to require the scale of a DHT. Where those two factors are true, I think the author’s conclusion that migrating to a DHT is a “premature optimization” holds.

  22. ashwinraghav says:

    Like you said, it is important that discussions are better limited if there is an understanding of the right tool for the right job philosophy.
    Since we are particularly dealing with databases what are read and write intensive in Facebook applications, I will stick to that context.

    Also, a major point you have made is the development time that is saved as a result of not having to write schema(and sometimes data) migrations.

    In my experience, NOSQL will save/waste your time depending on the programming language and the frameworks being used.

    Typeless languages like Ruby and Python are clearly to benefit out of having a type system at the data-base in the form of a schema. On the other hand, strongly typed languages like Java can clearly escape from having a type system in the data-base. Redundant in many a way.

    That said, typeless languages can mitigate risks through unit tests and the likes. Thus for what I have just said, the opposite conclusions can be drawn in favor of consistent typelessness over wanting to mitigate it by handling exceptions thrown by databases.

    Overall, I can see what you mean though.

  23. Why can’t everyone chill and just place NoSQL, MySQL, MSSQL into the pool called “tools of the trade”. If you’re truly experienced in development of any sort, you’ll realize that partisanship doesn’t get your anywhere.

    The key isn’t to try and jam yourself into a technology and cross your fingers that you made the right choice – a part of the game is identifying the appropriate time to do a technology switch if needed.

    Listening to the craigslist presentation is a point in case.

    The NoSQL solutions and tools are here to stay, and maturing at impressive speeds.

    I view the language as impassioned 😉

  24. Kevin Burton says:

    I love how he says “NoSQL as and if it becomes helpful”

    .. as if it’s not helpful to a LARGE number of people now 🙂

  25. Every business has a collection of read mostly data (the largest portion by far), read write data (the next largest), and “the money” (the smallest). For the read mostly data, replication and slicing is important. For the read/write data, propagating changes is important, and for stuff dealing with actual $, transactions are important.

    R/M = ##############################################
    R/W = ######
    ACID = #

    Using everyone’s favorite example: Amazon has a table of products they sell with associated data that’s definitely read-mostly. Then they have your shopping cart, which needs to be updated with “so and so wants to purchase this item”, definitely read/write at a much smaller volume. Finally, they take your order and submit it, which definitely needs some kind of transaction mechanism since they want to be sure and bill you once, and only once.

    All data handling solutions are going to be better at one type of data over another, but relational databases are pretty attractive because in the ’80s, when the guy at IBM whose name I forget came up with SQL, he got the math right. If you use a relational database, you’re going to be forced to think about your data’s structure. I became a better object-programmer have I learned about schema normalization, and I still regularly catch problem’s in other people’s designs by asking them why they have a 1-1 relationship.

    So you may well be better off proving your concept with a relational database before you go to some other kind of solution. Relational databases are definitely a “one size fits all” kind of solution, but because they got the math right back in the 80’s, its not such a terrible one.

    That doesn’t mean everything should go in the database either; storing images in a database makes little sense when you can store a file path instead.

    So I think I agree with the point of view of NoSQL being a premature optimization. NoSQL is a philosophy not a specific technology. There are different types of NoSQL solutions: Document stores, key-value stores, column-stores and so on. So the minute you go out of the relational world, you have to start thinking, and thinking hard about what category your data is in, and how its accessed. It is doubtful that you will end up with just one NoSQL solution.

    A startup has to be Nimble. Facebook may have set out to build Facebook, but Twitter didn’t set out to build Twitter nor did Flickr set out to build Flickr. Getting the data right and being able to change the schema easily and have things mostly work is key for a startup. Once you’re sure what your actual business is going to be, that’s when it makes more sense to start making the data flows better.

    But really, this is all about engineering. Choosing too early to use a NoSQL solution can clearly be a premature optimization, but choosing too late to use one can also be bad.
    A startup has to be Nimble. Facebook may have set out to build Facebook, but Twitter didn’t set out to build Twitter nor did Flickr set out to build Flickr. Getting the data right and being able to change the schema easily and have things mostly work is key for a startup. Once you’re sure what your actual business is going to be, that’s when it makes more sense to start making the data flows better.

    On the other hand, as an Enterprise Geek, I’m not a big fan of MySQL as a “relational database”. I don’t think it really qualifies. But that’s another post.

  26. spottiness says:

    Thank you for posting this. When I read the original by “Bob Warfield” I felt compelled to reply, but I found it so wrong that I couldn’t take it seriously, in spite of the exposure that it had by being in the front page of Hacker News. I just couldn’t believe my eyes and didn’t find the energy to respond.

  27. Pingback: Today’s Shared Links for July 24, 2011 – Chuqui 3.0

  28. Pingback: Around the web | alexking.org

  29. Pingback: NoSQL is What? | MySQL | Syngu

  30. Paul Nendick says:

    Thanks for this article Jeremy, your point about opportunity cost at expense of developer time is most salient.

    I’d like to point out that what bears comparison is not just some NoSQL solution vs MySQL. Most of us considering using MySQL today are actually considering: RDMBS + ORM + SQL. So let’s actually pit say, MongoDB vs MySQL+SQLAlchemy.

    Horses for courses I say.

  31. Pingback: Windows Azure and Cloud Computing Posts for 7/25/2011+ - Windows Azure Blog

  32. Pingback: Is NoSQL A Premature Optimization That’s Worse Than Death? Or The Lady Gaga Of The Database World? | TouchHappy

  33. fwiw… i always thought NoSQL meant “Not Only SQL.” peeps are reacting like NoSQL is a demonic plague which will devour the crops, kill your first-born and invalidate the accelerated vesting on your ISOs.

    if you’ve got a problem that yields gracefully to the relational model, use the relational model. if you’ve got a problem that yields to an adjacency list model, use it instead.

  34. Andy Jeffries says:

    @Artem:
    “Want schema-less data? No problem, just put some JSON objects or pickled Python objects into a column.”

    Comments like that make me think that you haven’t ever used a NoSQL storage system. It’s not about dumping data in without a schema (people have been serialising native objects for ages and storing them in DBs). It’s about being able to store arbitrary data in a system and query it.

    For example, let’s assume you store the following JSON string (as you suggest) in a MySQL column:

    {“name”:”John”, “date_of_birth”:”1985-01-14″, “favourite_colour”:”red”}

    How do you find the names of the users that are over 21?

  35. Pedro says:

    hey i liked the article ^^

  36. While not being able to argue technical points, I will say spreading Fear, Uncertainty and Doubt over a competing technology is a page right out of the Microsoft Rule Book. I think Andrew Z. is doing a fine job of providing a rejoinder to the FUD despite the over-hyped breathless boosterism NoSQL has enjoyed on websites like O’Reilly Radar and ReadWriteWeb. I think there needs to be a lot more anecdotal evidence from the design/implementation side of things where MySQL and NoSQL win and fail. Nobody expects to need the scale at the earliest stages, but you will lose a lot of member loyalty posting the Fail Whale to your home page on a regular basis. Opportunity Cost and Good Will in accounting terms is very difficult to measure at the outset of any website/webapp project.

  37. Pingback: NoSQL is What? (via Jeremy Zawodny's blog) « Carpet Bomberz Inc.

  38. Pingback: NoSQL is What? (via Jeremy Zawodny’s blog) « Carpet Bomberz Inc.

  39. Pingback: Standing Up for NoSQL | DATAVERSITY

  40. Pingback: Links for July 25th through July 29th — Vinny Carpenter's blog

  41. Pingback: ITbende » Post Topic » ITbende podcast nr. 113: Hadden we nog maar DOS

  42. Peter Shen says:

    I’m researching on NoSQL. So far I am getting two passionate but competing claims on whether to move to NoSQL or not.

  43. Pingback: SQL or NoSQL? How About Both? — jungleG

  44. Nicholas says:

    This argument is much like many others in technology; what it really boils down to is:
    What tool is best for the task/job/project?

    From reading multiple comments it’s apparent that many DO NOT have managerial experience nor are accustomed to business financial decision making – enterprise wide.

    This is not a slight; it’s an observation.

    Business owners/decision makers DO NOT care what technology is used as long as it’s profitable and can be supported easily, readily and has uniformity across their business space.

    Folks – that’s why Microsoft wins on many levels.

    As technology professionals we might have our preferences, however the bottom line is what’s the best fit for the enterprise and NOT our personal preferences.

  45. Jan says:

    I’m a RDBMS freak, because I learned from people who had grey beards and were not Hipsters like some people today (and I say SOME not ALL), but my brother who studied NoSql system long time wrote about them and did a comparison: http://www.kammerath.co.uk/nosql-on-the-spot.html – for me its still like “Bömische Dörfer” as we say in German, but maybe it helps you.

  46. Pingback: Big Data annd No SQL links | Fresh Water Perl

  47. Audry says:

    always i used to read smaller content that also clear
    their motive, and that is also happening with this piece of writing which I am reading here.

  48. Pingback: 10 Free Books for Learning NoSQL - CodeCondo

  49. When you “save development time” for yourself or your team, you are really just moving that time to some other part of the application lifecycle. For instance if you save time by not designing schemas or relationships that help aggregate multiple dimensional reports later, you just moved that time your saved into someone else lap. Think of it as a unique form of technical debt, however instead of poor design you just skipped a facet in favor of speed, performance and general awesomeness.

    So to answer the question, is “saving development time is not much of an advantage.” Not really. The developer value is very high, the business value is very low. You shouldn’t be making these kinds of decisions on developer value.

    True business value comes from knowing that the product will be stable, supported, provide the necessary business intelligence and be supported under contracts. Especially when the implementer needs to “reach back” and get professionals to solve a zero day vulnerability or storage crisis.

    I appreciate your passion and excitement, but take it from a business professional, It’s still rather immature and being on the bleeding edge is not always where you want to be when it’s your money you are risking.

Leave a reply to JohnnyL Cancel reply