Seeking Sucks: Spoiled by SSDs

I’m in the process of rebuilding full-text indexes for a good sized document collection that lives in a sharded MongoDB cluster. And the funny thing about this is that I don’t really use MongoDB that much. I mean we put data into it day after day, but I don’t personally have to interact with it that often. For this particular use case it “just works” the vast majority of the time I don’t have to think about it.

I like that.

But this particular task involves slurping ALL the data out of that cluster and onto a cluster of sharded Sphinx servers so I can re-index the roughly 3 billion documents. That’s all well and good, but since our MongoDB cluster isn’t terribly performance sensitive, it is built on old-fashioned (am I allowed to use that phrase?) spinning disks. And you know what that means, right?

Yeah, seek time matters. A lot.

If this was hitting our production MySQL clusters, I wouldn’t care nearly as much. Those all use one flavor or another of flash stoarge. In fact, we’ve been using SSDs long enough and in enough places that I’m spoiled at this point. I sort of cringe every time I have to deal with disk seeks. That’s so five years ago.

Anyway, I knew this would be an issue so I tried to be clever. I dumped all the document IDs from Mongo in advance, doing so in a way that give them to me in “disk order” so that when I later had to fetch them for indexing, I’d be able to minimize the seeking and hopefully maximize the throughput.

Well, that plan kind of half worked. You see, I had made the assumption that “disk order” on one member of a replica set would be the same as “disk order” on another member of the set. That appears not to be the case. So I had to work around this by telling the indexer processes not to use the mongos routing server, instead talking directly to the mongod on the specific server(s) that I fetched the ids from originally.

I look forward to a few more years from now, when we really do view spinning disks as “the new tape” and use them mainly for archival tasks.

About Jeremy Zawodny

I'm a software engineer and pilot. I work at craigslist by day, hacking on various bits of back-end software and data systems. As a pilot, I fly Glastar N97BM, Just AirCraft SuperSTOL N119AM, Bonanza N200TE, and high performance gliders in the northern California and Nevada area. I'm also the original author of "High Performance MySQL" published by O'Reilly Media. I still speak at conferences and user groups on occasion.

View all posts by Jeremy Zawodny →

10 Responses to Seeking Sucks: Spoiled by SSDs

tony says:

April 8, 2013 at 6:48 am

Sorry to be off topic but see you worked at Yahoo Finance once and saw a post from seven years ago about lack of vision and progress there. Just wondering what you think of it now with the multiple errors rendering the site largely useless and also complete lack of meaningful improvements in the seven years since you wrote about it. How come no one else has constructed a better portal given the abject failure of Yahoo and Google to create a good user experience?

- Jeremy Zawodny says:
  
  April 8, 2013 at 7:32 am
  
  Yahoo Finance is a sad, hollow shell of its former self.
  
David Mytton says:

April 8, 2013 at 9:21 am

Does that mean you’re dumping the IDs then looping through them in order, pulling the documents out sequentially? Are you using something for storing the IDs or just dumping them to a file/memory structure?

- Jeremy Zawodny says:
  
  April 8, 2013 at 9:27 am
  
  Yes, I’m dumping the IDs to disk and then using that file to populate a Redis queue so that I can process them in batches in disk order.
  
Pingback: Stuff The Internet Says On Scalability For April 12, 2013 | Krantenkoppen Tech
Tom says:

June 13, 2013 at 7:14 am

Hi Jeremy!

You have Polish-sounded surname 🙂 “Zawodny” is a typical surname in Poland 🙂 I think You have Polish ancestors…

Have You ever been in Poland? Hmm…? I hope You think about this, because Poland is a country with amazing history, nice people, good food, interesting art and culture… 😉

(sorry for my English, but my native language is Polish).

If You were ever in Poland, in Wroclaw city, I can help You (I live in Wroclaw), I can tell You something about this city and all Poland 🙂

But now… You can read Epic Poland 🙂 this is my fanpage about amazing facts about Poland 😀

https://www.facebook.com/pages/Epic-Poland/652932704723120?fref=ts

I hope You enjoy Epic Poland 🙂

Greetings from Wroclaw,
Tom

BRKHD1 (@BRKHD1) says:

June 17, 2013 at 12:03 pm

heres a video I did on my SSD: http://www.youtube.com/watch?v=gZJ-pPeE5lA

K says:

July 19, 2013 at 12:54 pm

I agree JZ. Life is too good w/ SSDs.
Can’t believe we used to sit and wait for disks to spin to load games, files, etc. The noise, the power drain (on laptops), and the wait for HDD, oh the wait…

Helen Z says:

July 29, 2013 at 5:59 pm

I envy you. You may have to deal with couple seconds of seek time when you don’t have SSD. My life in build eng. will be ruined without SSD.

If Build Eng. is the package delivery guy in software industry, removing SSD from this package delivery guy is like removing slides from him when he’s carrying the package. =(

So, you are not alone, Jeremy.

zentai suits says:

September 20, 2014 at 8:08 pm

It’s very easy to find out any matter on web as
compared to books, as I found this article at this website.