Install Apache Hadoop Ubuntu 15.10 Wily Linux
Hi! The Tutorial shows you Step-by-Step How to Install and Getting-Started with Apache Hadoop/Map-Reduce vanilla in Pseudo-Distributed mode on Ubuntu 15.10 Wily Werewolf GNU/Linux Desktop.
Hadoop is a distributed master-slave that consists of the Hadoop Distributed File System (HDFS) for storage and Map-Reduce for computational capabilities.
The Guide Describe a System-Wide Setup with Root Privileges but you Can Easily Convert the Procedure to a Local One.
The Apache Hadoop for Ubuntu 15.10 Wily Require an Oracle JDK 8+ Installation on System.
-
Download Latest Apache Hadoop Stable Release:
-
Double-Click on Archive and Extract Into /tmp Directory
-
Open a Shell Terminal emulator window
Ctrl+Alt+t
(Press “Enter” to Execute Commands)In case first see: Terminal QuickStart Guide.
-
Relocate Apache Hadoop Directory
sudo su
If Got “User is Not in Sudoers file” then see: How to Enable sudo
mv /tmp/hadoop* /usr/local/
ln -s /usr/local/hadoop* /usr/local/hadoop
mkdir /usr/local/hadoop/tmp
sudo chown -R root:root /usr/local/hadoop*
-
How to Install Required Java JDK 8+ on Ubuntu
-
Set JAVA_HOME in Hadoop Env File.
sudo su
If Got “User is Not in Sudoers file” then see: How to Enable sudo
mkdir /usr/local/hadoop/conf
nano /usr/local/hadoop/conf/hadoop-env.sh
Append:
export JAVA_HOME=/usr/lib/jvm/<oracleJdkVersion>
Ctrl+x to Save & Exit from nano Editor :)
-
Hadoop Configuration for Pseudo-Distributed mode
nano /usr/local/hadoop/conf/core-site.xml
Append:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> </configuration>
Next:
nano /usr/local/hadoop/conf/hdfs-site.xml
Append:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <!-- specify this so that running 'hdfs namenode -format' formats the right dir --> <name>dfs.name.dir</name> <value>/usr/local/hadoop/cache/hadoop/dfs/name</value> </property> </configuration>
Last:
nano /usr/local/hadoop/conf/mapred-site.xml
Append:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration>
-
SetUp Local Path & Environment.
exit
cd
nano .bashrc
Inserts:
HADOOP_HOME=/usr/local/hadoop/nexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin/nexport JAVA_HOME=/usr/lib/jvm/<oracleJdkVersion>
Then Load the New Setup:
source $HOME/.bashrc
-
SetUp Needed Local SSH Connection
sudo systemctl start ssh
Generate SSH Keys to Access:
ssh-keygen -b 2048 -t rsa
echo "$(cat ~/.ssh/id_rsa.pub)" > ~/.ssh/authorized_keys
Testing Connection:
ssh 127.0.0.1
-
Formatting HDFS
hdfs namenode -format
-
Starting Up Hadoop Database
start-all.sh
-
Apache Hadoop Database Quick Start Guide
Hadoop MapReduce Quick Start