Installation of Hadoop for handling Big Data has always been a tricky task. The process involves editing multiple configurations and setup files. So, it’s not unusual that developers get stuck in the middle of the process time and again.
However, there are few common loopholes that developers come across and have very little or no information about how to go about solving them. For example, installing in a 64 bit operating system.
Space-O has been involved in providing Big Data as a Service or BDaaS to its clients all across the globe. Lately, our ingenious developers have been working on Hadoop extensively. They have figured out a few loopholes in the framework and have effectively evaluated solutions as well. Here’s how they did it:
When Space-O sense that something is wrong, instead of letting everybody know about it, we figure out the way what it is. In day 1 to begin with, Hadoop needs Java to be installed, so our developers initiated by installing OpenJDK7 on Ubuntu, which didn’t work with Hadoop 2.4.1, and showed errors given below:
WARN util.NativeCodeLoader: Unable to load native-Hadoop library for your platform… using built-in-java classes where applicable
Binary distribution of Hadoop 2.4.1 is compiled on 32 bit, and they had 64 bit box. This was the first challenge they faced to resolve it.The reason for building from source is that if you have a 64-bit OS, the Hadoop tar ball will not include ‘native libraries’ for your system and you will get weird errors such as above.
Build Hadoop from source using Maven, Protobuf.
Before you begin install all required tools. Start building by using this command:
- apt-get install -y
- gcc g++ make maven cmake zlib zlib1g-dev libcurl4-openssl-dev
Then, they downloaded Hadoop and started building using the followingcommands:
However after this they encountered another error:
[INFO] BUILD FAILURE
[INFO] Total time: 1.730s
[INFO] Finished at: Wed Apr 17 07:06:39 UTC 2013
[INFO] Final Memory: 8M/360M
[ERROR] Failed to execute goal on project hdfs-nfs-proxy: Could not resolve dependencies for project com.cloudera:hdfs-nfs-proxy:jar:0.8.1: Could not find artifact jdk.tools:jdk.tools:jar:1.6 at specified path /usr/lib/jvm/java-7-openjdk-amd64/jre/../lib/tools.jar -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project hdfs-nfs-proxy: Could not resolve dependencies for project com.cloudera:hdfs-nfs-proxy:jar:0.8.1: Could not find artifact jdk.tools:jdk.tools:jar:1.6 at specified path /usr/lib/jvm/java-7-openjdk-amd64/jre/../lib/tools.jar
In Linux/Ubuntu box install Sun JDK and point JAVA_HOME to the already installed JDK.
They installed Oracle Java 8 and then started building Hadoop again but it failed again and it showed the following error:
[INFO] Apache Hadoop Annotations ……………………. FAILURE [4.086s]
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8
.1:jar (module-javadocs) on project hadoop-annotations: MavenReportException: Er
ror while creating archive:
[ERROR] Exit code: 1 – C:hadoop-srchadoop-common-projecthadoop-annotationssr
unexpected end tag:
This is an error reported by Javadoc. The Javadoc version in Java 8 is considerably stricter than the one in earlier version. It now signals an error if it detects what it considers to be invalid mark-up, including the presence of an end tag where one isn’t expected.
To turn off this checking in Javadoc, add the -Xdoclint:none flag to the Javadoc command line. Specifically, add
add it to the maven-Javadoc-plugin:
They then started building it again and it was successful. It displayed this success message:
[INFO] Apache Hadoop Distribution …………………… SUCCESS [1.613s]
[INFO] Apache Hadoop Client ………………………… SUCCESS [0.925s]
[INFO] Apache Hadoop Mini-Cluster …………………… SUCCESS [0.623s]
[INFO] BUILD SUCCESS
[INFO] Total time: 10:33.592s
[INFO] Finished at: Wed July 09 07:04:11 PKT 2014
[INFO] Final Memory: 77M/237M
All services were running with an output message like:
They were running Hadoop on Ubuntu machines as single node. On day 2, they tried to create files and folders for jobs on Hadoop file system. They followed the below given sequence of commands:
Create some files and folder on HDD
Now, create a folder in HDFS and upload the folder there.
hdfs dfs -copyFromLocal /tmp/test /user/hadoop/
hdfs dfs -ls /user/hadoop/test
But as soon as they triggered “hdfs dfs -copyFromLocal /tmp/test /user/hadoop/” command it showed this error:
No DataNodes are started
It was bit surprising that on day 1, data node was functioning but now showed that it had not started.So, they again checked JPS output:
It clearly showed that there was no DataNode.
- Stop the cluster
- Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
- Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
- Restart the cluster
Main reason is that bin/HadoopNameNode -format didn’t remove the old data. So they had to delete it manually.
They created folder on Hadoop file system to see if it actually created data or not. After coping files from local file system to Hadoop file system, it said, DataNode is not running. So, they checked it using JPS command and there actually was no DataNode.
They stopped all services of Hadoop and started all again but DataNode had still not started running. Later, they formatted the entireHadoop system to started DataNode but in vain.
When you have one cluster node, and old data in DateNode, then it will not run unless you delete it manually. Therefore, all Hadoop services were stopped and all data from DataNode was removed. Once gain they formatted the Hadoop file system and started all services again. Finally, they found success!
Space-O developers don’t fix problems but they turn them into opportunities, we have the expertise needed to turn your data into a strategic advantage.