I’ve been awash in debugging a new code deployment most of the week. The gist of it, if you’re interested, is that we’re deploying a group of machines running several Redis instances (one per CPU core). Collectively they’ll serve as an alternative to an existing MySQL-based system for tracking a variety of short and medium term data points.
A bit of data (just an identifier) will be fed into the system by some of our front-end web servers and then picked up by another process that will fetch more data, crunch it lightly, and then store summary information back in various Redis buckets. (Sorted Sets are cool, BTW.)
The trouble came, as it often does, shortly after the deployment when I started seeing errors and slow responses from some of the web servers using the new code. To troubleshoot, we rolled back the code on all but one server and I spent the next several days working out what was wrong.
I’ve found a bug in the high-level module I wrote, a few bugs in the lower-level module I’ve been using (but did not write), and a bug in the wrapper module that I wrote to sit between the high-level and lower-level module. Some of the bugs were findable by a simple reading of the code and thinking through the symptoms I’ve been seeing. Others required a lot more “debugging with print statements” and making guesses. Those generally take a lot longer.
The common thread in all of the harder to find bugs, both in this situation and in others I’ve faced, is that my assumptions often get in the way. The more I can step back and challenge the “obvious” things I believe about what should be happening, the faster I can figure out what the heck went wrong.
It gets easier and faster with time and practice, but it’s still frustrating. I think it’s human nature not to question your assumptions about things that should be “obvious”, but this is one of the situations in which that is essential. Because what you think is happening and what is really happening are often just different enough to make life interesting.