Wednesday, July 1, 2015

Cloudera Quickstart VM 5.3 Apache Pig configuration


Cloudera provides a pseudo-distributed node for working with Apache Hadoop. It is called the Cloudera Quickstart VM. While most tools in the Hadoop ecosystem such as Apache Sqoop and Apache Hive work right out of the box , Apache Pig requires some additional configuration to make it work smoothly. This blog post provides the details of such additional configuration steps.


1. Open a new Terminal.
2. su - root (Enter cloudera as the password)
3. cd /etc/pig/conf

a. mv (Let us make a copy of the default file)
b. cp -p


a. mv (Let us make a copy of the default file)
b. cp -p

6. Edit as below

a.Replace, A with the below, A
b. Then add a new line, A

7. Edit as below

a. Uncomment (remove the #) the line log4jconf=./conf/ if it is already commented and let the line start with no blank spaces.

b. Replace the line starting with #clustername with quickstart.cloudera:50010

quickstart.cloudera:50010 is the Hadoop cluster name in the Quickstart VM. You can find this information by running the hdfs dfsadmin -report command.

8. chmod -R o+w /etc/pig/conf.dist

9. cp -p /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar /usr/lib/hive/lib


The above steps will help avoid the following errors when Pig is run in interactive mode using the Grunt shell.

ls: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory

WARN pig.Main: Cannot write to log file: /etc/pig/conf.dist/pig_1435724561990.log

ERROR - ERROR: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1435707575650_0004' doesn't exist in RM.

No comments:

Related Posts Plugin for WordPress, Blogger...