How-to Quick-Start with Apache Hadoop on CentOS 7.x Linux

February 16th, 2017 by thelinuxevangelist

Hadoop 2.X Quick-Start on CentOS 7.x Linux




Hello CentOS User! This Tutorial Shows You Step-by-Step How-to Install and Getting-Started with Apache Hadoop/Map-Reduce vanilla in Pseudo-Distributed mode on CentOS 7.x Linux 32/64bit Desktop/Server.

Hadoop is a distributed master-slave that consists of the Hadoop Distributed File System (HDFS) for storage and Map-Reduce for computational capabilities.

The Hadoop Distributed File System (HDFS) is a distributed file system that spreads data blocks across the storage defined for the Hadoop cluster.

The foundation of Hadoop is the two core frameworks YARN and HDFS. These two frameworks deal with Processing and Storage.

The Guide Describe a System-Wide Installation with Root Privileges but You Can Easily Convert the Procedure to a Local One.

The Content and Details of How-to Install Hadoop on CentOS 6 Linux are Expressly Reduced to Give Focus Only to the Essentials Instructions and Commands.

Install Hadoop for CentOS 6 Linux - Featured
  1. Download Latest Apache Hadoop Stable Release:

    Apache Hadoop Binary tar.gz

  2. Double-Click on Archive and Extract Into /tmp Directory

    Linux CentOS Apache Hadoop Stable Quick-Start - Unity Extract tar.gz Archive

    Or from CLI:

    tar xvzf *hadoop*tar.gz -C /tmp
  3. Open a Shell Terminal emulator window
    (Press “Enter” to Execute Commands)

    Linux CentOS Apache Hadoop Stable Quick-Start - Open Terminal

    Or Login into the Server Shell.

  4. Relocate Apache Hadoop Directory
    Get SuperUser Privileges:

    sudo su

    If Got “User is Not in Sudoers file” then Look: Solution
    Then Switch the contents with:

    mv /tmp/hadoop* /usr/local/

    Make an hadoop symlink directory:

    ln -s /usr/local/hadoop* /usr/local/hadoop
  5. Make Hadoop Temporary Directory:

    mkdir /usr/local/hadoop/tmp

    Set the root as Owner:

    chown -R root:root /usr/local/hadoop*
  6. How-to Install Required Java JDK 7+ on CentOS Linux:

    Install CentOS JDK 7+ for CentOS Linux
  7. Set JAVA_HOME in Hadoop Env File
    Make the Conf directory:

    mkdir /usr/local/hadoop/conf

    Make an Env file:

    nano /usr/local/hadoop/conf/hadoop-env.sh

    Append:

    export JAVA_HOME=/usr/lib/jvm/[oracleJdkVersion]

    Change [oracleJdkVersion] with the current Version:
    Ctrl+x to Save & Exit from nano Editor :)

  8. Eclipse Hadoop 2.X Integration with Free Plugin:

    Hadoop 2.X Eclipse Plugin SetUp
  9. Hadoop Configuration for Pseudo-Distributed mode
    nano /usr/local/hadoop/conf/core-site.xml

    Append:

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
     <property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/hadoop/tmp</value>
     </property>
     <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:8020</value>
     </property>
     </configuration>
    

    Next:

    nano /usr/local/hadoop/conf/hdfs-site.xml

    Append:

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
     <property>
     <name>dfs.replication</name>
     <value>1</value>
     </property>
     <property>
     <!-- specify this so that running 'hadoop namenode -format'
     formats the right dir -->
     <name>dfs.name.dir</name>
     <value>/usr/local/hadoop/cache/hadoop/dfs/name</value>
     </property>
     </configuration>
    

    Last:

    nano /usr/local/hadoop/conf/mapred-site.xml

    Append:

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
     <property>
     <name>mapred.job.tracker</name>
     <value>localhost:8021</value>
     </property>
     </configuration>
    
  10. SetUp Local Path & Environment
    Exit from SuperUser to the normal User:

    exit

    Change to the Home directory:

    cd $HOME

    Edit the bash Config file:

    nano .bashrc

    Inserts:

     HADOOP_HOME=/usr/local/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    export JAVA_HOME=/usr/lib/jvm/<oracleJdkVersion>

    Then Load the New Setup:

    source $HOME/.bashrc
  11. SetUp Needed Local SSH Connection
    sudo su -c "openssh-server"

    Generate SSH Keys to Access:

    ssh-keygen -b 2048 -t rsa
    cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

    Testing Connection:

    ssh 127.0.0.1
  12. Formatting HDFS
    hadoop namenode -format

    Install Hadoop for CentOS Linux 6.x - Terminal Apache Hadoop HDFS Formatting Succcess

  13. Starting Up Hadoop Database
    start-all.sh
  14. Apache Hadoop Database Quick-Start Guide:

    Hadoop MapReduce Quick-Start
(Visited 13 times, 1 visits today)

Tags: , , , , , , , , , , , , , , , ,


Comments are disabled