This is a short list of technology and tools that I’ve been looking at off and on over the last few months and would like to try out to solve real problems (not just toy projects).
- RabbitMQ, because queuing make a lot more sense that polling in a lot of applications.
- 0MQ, because I like the higher level network connection abstractions.
- Hadoop (probably from Cloudera), because we have a lot of machines and a lot of data and are always Doing It Wrong. MapReduce and HDFS could simplify A LOT and make things way faster.
- Redis 2.2, because having the expire semantics that people expect will make some things a lot easier. 2.0 is working great but 2.2 will fix one of its biggest warts, IMO.
- Apache Mahout, because I’m curious what we could learn if we feed it some of our data.
MongoDB would have been on this list a few months ago, but I’m in the middle of a project using it now. Come to MongoSV and you can hear about that (of course, I’ll talk about it some here as well).
Since I work in Perl a lot but haven’t kept up on some of what the community has built in the last few years, there are some modules/frameworks that I feel like I should be paying more attention to and trying out:
- Moose, because it seems to make OO not suck.
- POE, because I like event-driven stuff in some cases.
- Coro, because it seems over the top and crazy, but also quite useful
- Plack, because I’m starting to think we’d be better off ditching Apache/mod_perl since we’re really not using much of Apache (and we’re still on 1.3).
- AnyEvent, because I’ve played with it some but would really like to do more.
Now, does anyone have some spare time I can borrow?
Nice list, I have been considering those top 2 MQ solutions since AMQP seemed to die under its own complexity. Also was thinking about the new Amazon EC2 PHP 5.2 toolkit as well as exercising HTML 5.0 and friends.
Pingback: Always Test with Real Data | Jeremy Zawodny's blog
i asked the same Qs last year and ended up off the deep end of AnyEvent and have been very rewarded. Forget POE and do plenty of benchmarks on your Moose (or Mouse) for code at scale.
Thanks for the Cloudera mention.
If you have a lot of log data that you need to collect, then you should also kick the tires of Flume, it is a scalable data collection system integrated with Hadoop (though can be used without it too). You can define agents to collect data from any source(s) and dump into any sink(s). It comes with a number of predefined sources and sinks.
I would suggest AnyEvent over POE since it integrates nicely with Coro, taking the pain of writing event-driven applications away (i.e. the callback style which POE obscures somewhat). Also, AnyEvent can be run under any event-loop (even POE).