Be a Big Pig

So, you got a lot of data?

I'm not talking about that shoebox under your desk that has all the receipts you forgot to add into your taxes last year, I'm talking great big honking gobs of data. Like that extra copy of the human genome you've been wanting to index through, or that project where you actually test the whole infinite monkey thing, that sort of deal. One where you can really put those extra machines and a few terabytes of data to a good use.

From the fine folks in Berkeley Research comes Project Pig.

Ok, kidding aside, this is actually kinda cool. What it does is spread the load of searching and indexing data across multiple machines allowing you to use a fairly simple, slightly familiar syntax to get the data. What's more, while it's suited for large data sets, it works for smaller ones too. The code is written in Java, meaning good cross platform capabilities, and the syntax, called "Pig Latin", looks more like SQL with variables than anything else. The language can be embedded in existing programs (the same way that any JDBC module can) and uses lazy evaluation meaning that you only pay the price of execution when you take action on it. (Don't worry. Language nerds understand that.)

It's a fun toy to play with and it fits in nicely with the other like toys that are showing up on the net lately.

Leave a Reply


Yahoo! Font by Daniel Gauthier
Feed Icons by Matt Brett