Dataset

PigTutorial - Pig Wiki

Added By Infochimps

Apache Pig is a platform for analyzing large data sets. Pig’s language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing.

Pig Latin programs run in a distributed fashion on a cluster (programs are complied into Map/Reduce jobs and executed using Hadoop). For quick prototyping, Pig Latin programs can also run in “local mode” without a cluster (all processing takes place in a single local JVM).

The Pig tutorial file includes the Pig JAR file (pig.jar) and the tutorial files (tutorial.jar, Pigs scripts, log files). These files work with Hadoop 0.18 and provide everything you need to run the Pig scripts. To get started, follow these basic steps:

1. Install Java.
2. Download the Pig tutorial file and install Pig.
3. Run the Pig scripts – in local mode or on a Hadoop cluster.