I’m sure this will end up attracting far more (both in number and complexity) suggestions than I can reasonably implement, but I figured I’d ask anyway…
I’m working on a project (discussed at the recent MongoSV conference) that will migrate the entire Craiglist posting archive from MySQL to MongoDB. While I’m testing some of the migration code, I see a lot of posting titles scroll by on the screen. Millions of them.
That got me wondering what the popular words in the titles might be. I could easily code that in to the migration job without appreciably slowing it down. And that got me thinking about the other things I might be able to compute and summarize along the way.
And that made me wonder what smart readers like you would do if you were going to run through all the data a few times on reasonably fast hardware.
Drop a comment and let me know. Hopefully I’ll be able to implement a few of them.