Download the latest version of PIG from the svn repository.
svn checkout http://svn.apache.org/repos/asf/hadoop/pig/trunk
A new folder named pig/trunk is created in your current directory.
Build PIG from the source using Ant.
cd pig/trunk
ant
A new file pig.jar is created.
We also need to build the tutorial files.
cd tutorial
ant
After you have setup your folder, you can run a PIG example script by calling
java -cp "pig.jar" org.apache.pig.Main -x local script1-local.pig
After running this command, a new file scrip1-local-results.txt has been created.
Although one can run PIG directly (as the above example shows), one typically runs PIG on top of Hadoop. You will need to do this to answer the questions in this problem assignment. We will run PIG on the IBM/Google cluster. Follow the instructions here to copy your pigtmp directory to the cluster.
You may run into an error when running script2-hadoop.pig on the cluster. If you get the message "ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Invalid alias: hour_frequency2::hour00::group::ngram in same ...", then change the line same1 = ... to same1 = FOREACH same GENERATE hour00::group::group::ngram as ngram, $2 as count00, $5 as count12;