There are many queries asked on internet about how to install Hadoop on Ubuntu, Linux, Windows 10/8.1/8/7 and Mac OS. So this guide shows how to install hadoop big data database on ubuntu 16.04 with very simple and easy steps.
According to Wikipedia, Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. There are three dimensions to big data known as Volume, Variety and Velocity.
According to Apache Hadoop ORG, The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
The main goal of this tutorial is to simplify installation of Hadoop Database on Ubuntu with correct and accurate commands, so that you learn more with Hadoop Database.
Note: Following tutorial can be used to install latest hadoop release.
This tutorial has been tested on :
Ubuntu 16.04
Hadoop Latest Version [ hadoop-2.9.0.tar.gz 350MB]
JAVA JDK
Java JDK is required for working of Hadoop. I recommend to install JDK7 or JRE7 above.
Following are the commands for installing JRE7 on Ubuntu
# Open terminal & give following commands
sudo apt-get update
sudo apt-get install openjdk-9-jre-headless
sudo apt-get install openjdk-9-jdk
#To check which java version is installed on your system
readlink -f /usr/bin/javac
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it .
# Hadoop requires SSH access to manage its nodes
sudo apt-get install ssh
sudo apt-get install rsync
In this tutorial i am installing hadoop-2.9.0.tar.gz 350MB which is the stable version of hadoop from http://www.eu.apache.org
# Download hadoop from : http://www.eu.apache.org/dist/hadoop/common/stable/
# copy and extract hadoop-2.9.0.tar.gz in home folder
# rename the name of the extracted folder from hadoop-2.9.0 to hadoop
# find whether ubuntu is 32 bit (i686) or 64 bit (x86_64)
uname -i
Below command open the file
hadoop-env.sh
in gedit
gedit ~/hadoop/etc/hadoop/hadoop-env.sh
Add the bellow line at the end of FILE [ hadoop-env.sh ]
FOR 32 bit:
# add following line in the file at the end
# for 32 bit ubuntu
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
# save and exit the file
FOR 64 bit:
# add following line in the file at the end
# for 64 bit ubuntu
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# save and exit the file
Save and exit the file.
# to display the usage documentation for the hadoop
~/hadoop/bin/hadoop
1. Standalone Mode
# 1. standalone mode
mkdir input
cp ~/hadoop/etc/hadoop/*.xml input
# the next two line instruction is a single command
#Make sure that you change the hadoop file name version in my case 2.9.0.jar
~/hadoop/bin/hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar
grep input output us[a-z.]+
cat output/*
# Our task is done, so remove input and output folders
rm -r input output
2. Pseudo-Distributed mode
Find out your user name using following command and remember it, as we are going to use it in next step
whoami
Open core-site.xml file using following command
gedit ~/hadoop/etc/hadoop/core-site.xml
Replace the <configuration>ANY TEXT BETWEEN</configuration> tags with the bellow code
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:1234</value>
</property>
</configuration>
Save file and exit.
Open hdfs-site.xml file using following command
gedit ~/hadoop/etc/hadoop/hdfs-site.xml
Replace the <configuration>ANY TEXT BETWEEN</configuration> tags with the bellow code and correct USER NAME:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/your_user_name
/hadoop/name_dir</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/your_user_name
/hadoop/data_dir</value>
</property>
</configuration>
Save file and exit.Setup passphraseless/passwordless ssh
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
export HADOOP\_PREFIX=/home/your_user_name/hadoop
ssh localhost
# type exit in the terminal to close the ssh connection (very important)
exit
The following instructions are to run a MapReduce job locally.
# The following instructions are to run a MapReduce job locally.
#Format the filesystem:(Do it only once)
~/hadoop/bin/hdfs namenode -format
#Start NameNode daemon and DataNode daemon:
~/hadoop/sbin/start-dfs.sh
# check which daemons are running by,
jps
DONE!!! And the final step Open your browser and type the following URL as hadoop uses 50070 port
#Browse the web interface for the NameNode; by default it is available at:
http://localhost:50070/
ADDITIONAL COMMANDS
#Make the HDFS directories required to execute MapReduce jobs:
~/hadoop/bin/hdfs dfs -mkdir /user
~/hadoop/bin/hdfs dfs -mkdir /user/your_user_name
#Copy the sample files (from ~/hadoop/etc/hadoop) into the distributed filesystem folder(input)
~/hadoop/bin/hdfs dfs -put ~/hadoop/etc/hadoop input
#Run the example map-reduce job
~/hadoop/bin/hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output us[a-z.]+
#View the output files on the distributed filesystem
~/hadoop/bin/hdfs dfs -cat output/*
#Copy the output files from the distributed filesystem to the local filesystem and examine them:
~/hadoop/bin/hdfs dfs -get output output
#ignore warnings (if any)
cat output/*
# remove local output folder
rm -r output
# remove distributed folders (input & output)
~/hadoop/bin/hdfs dfs -rm -r input output
#When you’re done, stop the daemons with
~/hadoop/sbin/stop-dfs.sh
jps
THANKS FOR READING THE TUTORIAL AND HOPE HADOOP IS INSTALLED ON SYSTEM
Check out: HOW TO INSTALL GOOGLE CHROME IN UBUNTU 16.04 LTS
Comments
Post a Comment