How-to Install Apache Hadoop on Debian Step-by-Step Easy Guide

December 4, 2013 | By thelinuxevangelist | Filed in: Tutorial.

Hadoop 2.X Quick-Start on Debian Linux

Hello Debian User! The Tutorial Shows You Step-by-Step How-to Install and Getting-Started with Apache Hadoop/Map-Reduce vanilla in Pseudo-Distributed mode on Debian Squeeze-6/Wheezy-7/8-Stretch/Testing/Unstable GNU/Linux 32/64-bit Desktop/Server.

Hadoop is a distributed master-slave that consists of the Hadoop Distributed File System (HDFS) for storage and Map-Reduce for computational capabilities.

The Hadoop Distributed File System (HDFS) is a distributed file system that spreads data blocks across the storage defined for the Hadoop cluster.

The foundation of Hadoop is the two core frameworks YARN and HDFS. These two frameworks deal with Processing and Storage.

The Guide Describe a System-Wide Installation with Root Privileges but You Can Easily Convert the Procedure to a Local One.

The Contents and Details of How-to Install Hadoop on Debian Linux are Expressly Essentials to Give Focus Only to the Essentials Instructions and Commands.

Install Hadoop on Debian Linux - Featured

  1. Download Latest Apache Hadoop Stable Release:

    Apache Hadoop Binary tar.gz

  2. Double-Click/Right-Click on Archive & Extract into /tmp

    Install Hadoop on Debian Linux - KDE4 Apache Hadoop Stable tar.gz Extraction

    Or from CLI:

    tar xvzf *hadoop*tar.gz -C /tmp
  3. Open Terminal Window
    (Press “Enter” to Execute Commands)

    Cmd and Search “term”

    Install Hadoop on Debian Linux - Open Terminal

    Or Login into Server.

  4. Relocate Apache Hadoop Directory
    Get SuperUser Privileges:

    su -

    If Got “User is Not in Sudoers file” then Look: Solution
    Then Switch the contents with:

    mv /tmp/hadoop* /usr/local/

    Make an hadoop symlink directory:

    ln -s /usr/local/hadoop* /usr/local/hadoop
  5. Make Hadoop Needed Directories:

    First Make the Logs Dir:

    mkdir /usr/local/hadoop/logs

    Giving Writing Permissions:

    chmod 777 /usr/local/hadoop/logs

    Next Make the Cache Dir:

    mkdir /usr/local/hadoop/cache

    Same Writing Permissions as for Logs:

    chmod 777 /usr/local/hadoop/cache

    And then also the Temporary Dir:

    mkdir /usr/local/hadoop/tmp

    Set the root as Owner:

    chown -R root:root /usr/local/hadoop*
  6. How-to Install Required Java JDK on Debian:

    Install Oracle JDK for Debian
  7. Set JAVA_HOME in Hadoop Env File
    nano /usr/local/hadoop/conf/hadoop-env.sh

    Inserts:

    export JAVA_HOME=/usr/lib/jvm/[oracleJdkVersion]

    Change [oracleJdkVersion] with the current Version.
    Ctrl+x to Save & Exit :)

  8. Eclipse Hadoop 2.X Integration with Free Plugin:

    Hadoop 2.X Eclipse Plugin SetUp
  9. Configuration for Pseudo-Distributed mode
    nano /usr/local/hadoop/conf/core-site.xml

    The Content Should Look Like:

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
     <property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/hadoop/tmp</value>
     </property>
     <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:8020</value>
     </property>
     </configuration>
    

    Next:

    nano /usr/local/hadoop/conf/hdfs-site.xml

    The Content Should Look Like:

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
     <property>
     <name>dfs.replication</name>
     <value>1</value>
     </property>
     <property>
     <!-- specify this so that running 'hdfs namenode -format'
     formats the right dir -->
     <name>dfs.name.dir</name>
     <value>/usr/local/hadoop/cache/hadoop/dfs/name</value>
     </property>
     </configuration>
    

    Latest:

    nano /usr/local/hadoop/conf/mapred-site.xml

    The Content Should Look Like:

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
     <property>
     <name>mapred.job.tracker</name>
     <value>localhost:8021</value>
     </property>
     </configuration>
    
  10. SetUp Path & Environment
    Leave the SuperUser and Return the normal User:

    su <myuser>

    Access the Home Directory:

    cd  

    Edit the Bash Config file:

    nano .bashrc

    Inserts:

     HADOOP_HOME=/usr/local/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    The JAVA_HOME is Set Following Oracle Java JDK6+ Installation Version…

    Then Load New Setup:

    source $HOME/.bashrc
  11. SetUp Needed Local SSH Connection
    su -c "openssh-server"

    Generate SSH Keys to Access:

    ssh-keygen -b 2048 -t rsa
    echo "$(cat ~/.ssh/id_rsa.pub)" >> ~/.ssh/authorized_keys

    Testing Connection:

    ssh 127.0.0.1
  12. Formatting HDFS
    hdfs namenode -format

    Install Hadoop on Debian Linux - Terminal Apache Hadoop HDFS Formatting Succcess

  13. Starting Up Hadoop Database
    start-all.sh
  14. Apache Hadoop Database Quick-Start Guide:

    Hadoop MapReduce Quick-Start

Tags: , , , , , , , , , , , , ,