GNU Parallel and Block Size(s)

I’ve been a fan of GNU Parallel for a while but until recently have only used it occasionally. That’s a shame, because it’s often the simplest solution for quickly solving embarrassingly parallel problems.

My recent usage of it has centered around database export/import operations where I have a file that contains a list of primary keys and need to fetch the matching rows from some number of tables and do something with the data. The database servers are sufficiently powerful that I can run N copies of my script to get the job done far faster (where N is value like 10 or 20).

A typical usage might look like this:

cat ids.txt | parallel -j24 --max-lines=1000 --pipe "bin/munge-data.pl  --db live >> {#}.out

However, I recently found myself scratching my head because parallel was only running 3 jobs rather than the 24 I had specified. After trying various experiments I finally went back and re-read the very complete manual page.

And, finally, I put the pieces together when I came across the notion of “blocks” and then saw this in the section about piping:

Spread input to jobs on stdin (standard input). Read a block of data from stdin (standard input) and give one block of data as input to one job. The block size is determined by –block.

The default block size is 1MB. How big was my input file? Event though it contained hundreds of thousands of primary keys, it was about 2.5MB in size.

Ah ha! That explained why parallel only bothered to fire up 3 sub processes for me. So a bit of tweaking was in order and I ended up with this:

cat ids.txt | parallel -j24 --block-size=32K --max-lines=1000 --pipe "bin/munge-data.pl  --db live >> {#}.out

That, as we like to say at work, runs good. My 15 minute task now completes in a less than 2 minutes.

While parallel is a useful tool, it also has A LOT of options. This is about the 4th or 5th time I’ve had to read the manual page and I find that I’m still learning things each time I do. Hopefully this will save someone else a bit of head scratching when they can’t figure out why GNU Parallel isn’t running the number of jobs they asked for.

Posted in craigslist, mysql, programming, tech | 8 Comments

I’m blogging again…

All the post Trump election crap in my Facebook “news” feed finally motivated me to spend less time there (just long enough to post links to my blog and look in the 2-3 groups I care about) and start writing again. Trying to filter it all out is a lost cause and nobody has built a “facebook without politics”, so here we are.

Yeah, it never really dawned on me but facebook is really designed for reading and not writing. So I do what most people do there and spend most of my time reading and scrolling and watching cat videos.

But since I seem to see political crap everywhere I look (apparently people have forgotten how to be people), I’m just going to go back to blogging where I can write on my platform and not have to spend time reading past all the stuff I don’t care to see anyway.

Sounds almost like freedom, doesn’t it?

Now I just need to figure out what to write about… Flying? Space computers? Food? Technology? Cats? Who knows.

Posted in misc, other | 8 Comments

Today’s lame sales pitch…

I’m going to paste the full text of an email message that recently landed in my inbox and unfortunately wasn’t tagged as spam. But I won’t paste it in one blob, I’ll do it piece by piece so it is easier to dissect.

Subject: RE: Jeremy Z

That’s a weird subject line. There are no other messages in my entire mailbox that contain a subject like that, but I’m supposed to think this person is replying to a message about me? Surely it wasn’t crafted that way to trick me…

Hey Jeremy ,

Hm, a space after my name and before the comma. That feels like a mail merge bug. Two strikes so far.

I don’t want to waste any of your time.

OH THANK GOD. ANYONE WHO SAYS THAT COULDN’T POSSIBLY BE LYING!!!

(In other words, that’s almost a third strike.)

I’m keen on having a quick chat with you on how FooPlus can add value to Craigslist.org

Aw crap. Here we go. The phrase “add value” is one of the most meaningless in the English language. You’re clearly trying to get me interested while simultaneously telling me nothing about what you actually offer.

Does that EVER work? And do you really think I’m the kind of person it does work on?

And why a quick chat? Can’t you just put some bullet points in the email before I have to think about whether or not I want to cough up my Burner (err, I mean “phone number”) for you?

Foo is a web accelerator significantly enhancing web performance all whilst saving companies a bunch on server cost. I would like to introduce you to FooPlus, which has contributed to the following: ESPN, Amazon, BBC, CNN, Vimeo, Disney, The New York Times, Tumblr, Nikon, Home Depot and many more.

Well, you’ve just wasted an entire paragraph NOT telling me how it would “add value” (whatever that means) or even telling me what it does. Honestly “enhancing web performance” could mean dozens of different things.

At worst, you’ll gain some knowledge on how to keep your current suppliers on their toes. At best, you’ll get some ideas on how to save money on server cost.

Suppliers… of what? Are you referring to our hardware vendors? I’m confused now. I don’t even know why I’m still reading…

Jeremy as I said before, I don’t want to waste any of your time, just keen on having a quick chat.

It’s pretty evident that you DO want to waste my time. You’ve contacted me, out of the blue, with a misleading subject line, and pitched me on a vague promise of “adding value” without really telling me anything tangible at all. And now you want to get me on the phone?

Can’t you see how I might be just A LITTLE skeptical of that “quick chat” not also wasting my time?

Are you available to sometime this week ?

I am not.

And what’s with the space before the question mark, anyway?

Posted in craigslist, misc, Uncategorized, wtf | 1 Comment

Screw HotWire.com and the Palms Casino Resort in Las Vegas

Let’s say you go online to Hotwire to make a last minute reservation for a room in Las Vegas so that your wife can meet up with cousin. And let’s say that you didn’t realize that the Nazi’s at the Palms Casino Resort in Las Vegas wouldn’t let her check in because her name wasn’t on the reservation.

You can call up Hotwire customer support and speak with someone (and I use the term “speak” loosely, since it’s barely English you’ll hear on the other end) about the matter. It turns out that they can’t do anything because the third party they work with is closed already.

Seriously. Closed!

It’s not a 24 hour operation and at 7:30pm on a Sunday they’re apparently just not doing business.

So you call the hotel and try to get her name added with the reservations department. They won’t do it. They need it to be done with that mysterious third party. After explaining the situation, they refuse to budge. So you ask to speak with the manager. And she gives you the same story and claims it’s an issue of being able to identify you.

Fine.

Offer to fax or email a copy of my driver’s license, passport, and a recent utility bill. No change. So then I say “you mean my ONLY option is to cancel the charge with my credit card company and leave negative reviews online for both you and hotwire?”

“Yes. I’m sorry sir.”

No, you’re not sorry. You’re pissing off a customer who’s willing to bend over backward to provide proof of identity. You’re just lying about being sorry.

And all for a $50 room.

Christ.

If you’re looking for travel to Las Vegas, you’d do well to avoid the Palms Casino Resort and Hotwire as well. You’ll waste more time than money you’ll save–believe me.

Screw both of them. They don’t deserve your business. And they lost mine.

Posted in misc, travel | Tagged , | 6 Comments

How to write a good used car ad…

For the last few weeks I’ve been shopping on-line for a new (to us) used car. And in the process of doing so, I’ve looked at hundreds of ads for cars within a 2-3 hour driving distance of our home in Groveland, California.

The bulk of my shopping has been via craigslist (of course) and I’ve found that a surprisingly number of sellers are either remarkably lazy or don’t know what basic information to include to to answer the obvious questions that most buyers are likely to have. This is especially true in the realm of cars with a salvage title (which can be a good deal).

So, in no particular order, I present a list of tips to for use car sellers. This is not at all specific to craigslist, but it certainly would hurt of more sellers there paid attention to these.

When selling a car on-line, it’s helpful if you:

  1. Include photos of the interior and exterior. If you can’t be bothered to snap a few pics, I’m assuming the car has issues with appearance and won’t give it a second thought.
  2. Make sure the photos are in focus and up-right. It only takes a few seconds to get right.
  3. Note any un-repaired damage. This stuff will come up anyway when the buyer comes to see it and realizes why the price is lower than expected.
  4. Say whether or not you’re the original owner. If you’re the only owner and have taken good care of it, say so in your ad.
  5. Specify the “trim package” so one can tell the difference between “Honda Accord DX” and “Honda Accord EX.” There’s a difference and it’s worth some real money.
  6. Include the VIN so that a potential buyer can do their homework before pestering you, busy seller. You ma not know this, but sites like AutoCheck and CarFax are really handy.
  7. In the case of a salvage car, offer to provide pictures of the damage and the name of the body shop that did the repair work (seriously, put yourself in the buyer’s shoes for a minute).
  8. Actually respond to emails when sellers contact you with questions. You do want to sell it, right?
  9. Describe what routine maintenance has or has not been done (tires, brakes, timing belt, water pump, oil/fluids, etc.)

#8 is particularly amazing. I, as “cash in hand” buyer looking to replace an aging car, got NO RESPONSES to 75% of the sellers I contacted via email. And I wasn’t asking any difficult questions.

The bottom line is that the more detail you can include in the ad, the more sellers are likely to pay attention, trust you, and want to deal with you. Don’t make the seller have to ask a dozen questions.

This will come as no surprise, but the car we ended up buying was from a seller who posted an ad with lots of detail, good pictures, and was very prompt to reply via email (including taking a few additional pictures we asked for).

Thanks car sellers!

Posted in craigslist | 2 Comments

My First Helicopter Lesson

Back on November 16th (last Sunday), we headed up to Auburn airport so that I could use the gift certificate for a 1 hour helicopter lesson that Kathleen got me for my birthday earlier this year. The plan was to fly in the Robinson R22 at Sierra Air Helicopters to get a feel for what it’d be like. Ever since our trip to Africa earlier this year, when we got to ride in a helicopter over Victoria Falls and then down into the the river canyon (video), I thought it’d be fun to try it out as a pilot.

The Robinson R22 Beta II at Auburn Airport

The Robinson R22 Beta II at Auburn Airport

After a bit of a pre-flight briefing in John’s office (the instructor), we headed out to do the actual flying. Now, I have to say, I knew the most basic theory about how to fly a helicopter, but I’d never done any reading or practice before. I expected this flight to be a lot of him demonstrating things that I’d try and mess up.

The R22 Instrument Panel

The R22 Instrument Panel

Since he had already done the pre-flight, we got in and he gave me a quick introduction to the gauges and equipment on the panel. Before long he had the engine started and was letting it warm up. When it was time, he asked me to raise the throttle so the governor would kick in and he could complete final checks.

Leaving the Departure End of Runway 25

Leaving the Departure End of Runway 25

After he confirmed the chopper was ready to fly, he asked me to pull up on collective (that’s the lever my left that controls the blade pitch) and we were in the air! We flew down the taxiway parallel to Runway 25, began a climb to 500 feet AGL, and departed the area on a left crosswind leg.

Departing the Area

Departing the Area

He took us down into a nearby river canyon for a bit of fun, flying maybe 15 feet off the water. That reminded me a lot of our Africa flight, of course. 🙂

After about 10 minutes of flying around in the canyon, he asked me to pull up on the collective a bit so that we could climb up above the canyon walls and I could try my hand at flying it. Much to my surprise, the R22 wasn’t as hard to fly as I had imagined (and had been warned). I found that if I kept my movements very small and gentle, the chopper responded in fairly predictable ways.

Trying a coordinated turn...

Trying a coordinated turn…

So I flew mostly straight and level for a few minutes and then experimented with gentle turns. I continued doing that and working to refine my footwork (I kept wanting to say rudder, but it’s really the tail rotor I was controlling). As time went on, I tried more and more dramatic control movements and turns (but nothing too crazy).

For quite a while, I was doing all the flying. John had relaxed to the point that his hands were nowhere near the controls anymore and he was telling me that people usually don’t fly this well on their first lesson. But since I was, he wanted to demonstrate a few more advanced things to me. So he asked me to fly toward Folsom Lake where he could take us down lower and demonstrate a few maneuvers.

He demonstrated a few autorotations and other emergency procedures (as well as common mistakes) and the asked me to head back toward the airport so I could experience the joy of attempting to hover in place (one of the hardest things to master).

Base to Final for 25 at Auburn

Base to Final for 25 at Auburn

I got us to the downwind entry and he took over to fly the pattern and get us back over the helipad. Once we were stable there, he briefed me on what I was trying to do and then let me try keeping the helicopter over the pad. While I have no pictures of that portion of the lesson, Kathleen managed to shoot a video of my hover.

In fact, I did pretty well with the hovering, so he pushed me a bit and had me perform a couple of takeoffs as well. And after I’d done a few rounds of takeoff to hover, he got us lined up over the taxiway and had me practice following the centerline while maintaining a relatively constant altitude.

That worked out pretty well until I had to make a fairly sharp turn. I was able to do it, but I got a bit uncoordinated and also let it climb a bit too much. So he took the controls and put us down back in the parking area for the R22. Our hour was up.

All in all, I had a hell of a good time. John is a very good instructor and the R22 seems like a fun flying machine. It’s tempting to consider an add-on rating so that I could fly helicopters now and then. But first I have to work on my instrument rating. Once that’s out of the way, who knows… 🙂

Posted in Uncategorized | Tagged | 2 Comments

Cast Iron Cooking

Last night I made our favorite black bean & green chili chicken soup for the first time in about a year. It’s the one recipe that always makes me pull out the cast iron pot instead of a cheap lightweight non-stick variety.

Cast Iron Pot

Doing so reminded me of why I really enjoy cooking with cast iron (and should probably do more of it). Aside from the pot or pan being so heavy that it simply won’t move around on the stovetop, it retains heat amazingly well. Once it has heated it up, it just stays hot. You can add 32 ounces of room temperature chicken broth and it’s up to a boil in just a few minutes.

Similarly, if you’re just starting a recipe by browning up some garlic and onion, you’ll be amazed at how quickly and evenly things cook up without having to raise the heat.

The only real downside is cleanup. You can’t just put the cast iron in a dishwasher–you’ve got to clean it by hand. But I think the trade-off is totally worth it. Food just tastes better when prepared in cast iron, and you know the pot/pan/skillet is going to outlive its non-sick counterparts by at least 40 or 50 years.

Posted in Uncategorized | Tagged | 2 Comments

Heading to Africa: Victoria Falls, Zambia, Namibia

As I mentioned earlier, we’re getting ready for a trip. Six years ago Kathleen and I went to Kenya and Tanzania for a 2 week African Safari and ended the trip with a 3 day stay on the island of Zanzibar where we then got married.

Wedding in Zanzibar

We’ve decided to go back! We head out later this week on a long overdue vacation. This time we’ll be starting in Victoria Falls before heading out across much of Zambia and Namibia to see some of Africa’s amazing wildlife and landscapes.

It didn’t start to feel real until a week or so ago when we went to get our travel vaccinations (ow!). But we’re now at the point where we need to finish up the remaining vegetables in the refrigerator, do packing, and finish final preparation so we can hop on a series of planes at the end of the week. (That leg from JFK to Johannesburg is going to be loooong.)

I’m not sure how much Internet access we’ll have during the trip (likely more than six years ago), but we’ll try to post a few pictures here and/or on my Facebook page as we can.

Posted in travel | 5 Comments

Installing Ubuntu (via crubuntu) on a Samsung 300 Series Chromebook

We’re getting ready for a trip in which we expect to take a lot of pictures. So I’d like to take a small, indexpensive computer along to handle the task of copying pictures from the cameras and memory cards to a portable USB hard drive (for backups and to make sure we have enough card space). For a while I considered using our old Samsung NC10 Netbook, but it’s rather slow and a little thicker and heavier than I’d prefer. So I looked at my Samsung Chomebook instead (the 300 series ARM-based 10″ model).

That seemed ideal, since it’s light, thing, and has great battery life as well as a built in SDCard reader. However, the operating system (ChromeOS) is so heavily bent toward “cloud” computing that it doesn’t make interacting with local storage devices easy. So I decided to take the plunge and install a full-blown Linux distribution: Xubuntu.

The preferred way to get various flavors of ARM-based Ubuntu on the Chromebooks is the CruBuntu script. You simply put the device in developer mode, open a shell, curl a file, and run it. From there it takes care of partitioning and downloading all the needed packages to give you a full-blown Linux “desktop” distribution. The only weird thing I’ve encountered so far is the strangeness of the default trackpad settings. But this guy has as fix for that. I may or may not apply that, since I’ve already paired a bluetooth travel mouse with the laptop.

It’s funny, I always thought of the Chromebook as a little toy that’d be handy on trips when I don’t need much time on-line. But now it’s suddenly become about 500% more useful since I can get access to all the Linux tools I could possibly want. Sure it’s not a powerful machine, but for moving photos around, handling email, and maybe posting a few things on-line, it’s more than up to the task.

This process wasn’t without a little “adventure” of course. It took three tries to get right. But since it’s mostly unattended, that wasn’t a big deal. Had I chosed Xubuntu the first time instead of taking the default Ubuntu (with Unity) I’d have been a bit better off. In any case, I now have a nice little Linux “netbook” (to re-apply an old label).

Posted in tech, travel | 1 Comment

Speaking at OpenWest 2014 Conference: real-time search infrastructure architecture at craigslist

Just a quick heads-up that I’ll be giving a 50 minute talk at OpenWest in Utah on Thursday, May 8th. The talk is titled real-time search infrastructure architecture at craigslist and is completely new. We’ve recently completed the 3 major revision to our search infrastructure at craigslist and I’ll be talking about what it looks like now, why, how we did it, and where we may go from here.

If you haven’t seen my talks on this topic before, I’ll be talking a lot about how we use the Sphinx Search Engine.

I’ve never been to OpenWest but I’ve head a lot of good things about it. And since I’ve been looking to broaden my conference horizons, it seemed like a good one to attend. Thanks to the folks at OpenWest for picking my talk and allowing craigslist to sponsor the event as well.

The full schedule looks like there will be a lot of interesting talks. I’m really looking forward to it.

On a related note: craigslist is hiring for frontend, backend, and systems administrators. Send me your resume at z@craigslist.org if you’re interested or just want to know more about working at craigslist. 🙂

Posted in craigslist, sphinx | 1 Comment