Cluster_chef is a powerful tool for maintaining and describing the software configurations that let a machine provide its services. owner contributors
HbaseBulkloader is a bulkloader for HBase that explores various strategies. Includes Apache Pig load and store functions. owner contributors
IMW is the Infinite Monkeywrench (IMW) is a Ruby frameworks to simplify the tasks of acquiring, extracting, transforming, loading, and packaging data. owner contributors
Wonderdog is a bulkloader for Elastic Search. Includes a simple storefunc for Apache Pig. owner contributors
The ChimpMARK-2010 is a collection of massive real-world data sets, interesting real-world problems, and simple example code to solve them. owner contributors