Step-by-step – Hadoop Installation in Fedora 36

April 19, 2022 | By the+gnu+linux+evangelist.

Configuring

  1. 8. Configuring Hadoop

    Now

    Edit Hadoop Configutation File
    To Setup Hadoop in Pseudo-Distributed mode with nano Editor:

    nano /usr/local/hadoop/conf/core-site.xml

    And Append:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
    </property>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
    </property>
    </configuration>
    

    Next again:

    nano /usr/local/hadoop/conf/hdfs-site.xml

    And Append:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <!-- specify this so that running 'hdfs namenode -format'
    formats the right dir -->
    <name>dfs.name.dir</name>
    <value>/usr/local/hadoop/cache/hadoop/dfs/name</value>
    </property>
    </configuration>
    

    Next Make the Log4j Configuration:

    nano /usr/local/hadoop/conf/log4j.properties

    Simply Click and Drag Down to Select & Right-Click Copy, finally Append into:

    # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  you may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    # Define some default values that can be overridden by system properties
    hadoop.root.logger=INFO,console
    hadoop.log.dir=.
    hadoop.log.file=hadoop.log
    
    # Define the root logger to the system property "hadoop.root.logger".
    log4j.rootLogger=${hadoop.root.logger}, EventCounter
    
    # Logging Threshold
    log4j.threshold=ALL
    
    # Null Appender
    log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
    
    #
    # Rolling File Appender - cap space usage at 5gb.
    #
    hadoop.log.maxfilesize=256MB
    hadoop.log.maxbackupindex=20
    log4j.appender.RFA=org.apache.log4j.RollingFileAppender
    log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
    log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
    
    log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    
    # Pattern format: Date LogLevel LoggerName LogMessage
    log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    # Debugging Pattern format
    #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
    
    #
    # Daily Rolling File Appender
    #
    
    log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    # Rollover at midnight
    log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
    
    log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
    
    # Pattern format: Date LogLevel LoggerName LogMessage
    log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    # Debugging Pattern format
    #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
    
    #
    # console
    # Add "console" to rootlogger above if you want to use this 
    #
    
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    
    #
    # TaskLog Appender
    #
    
    #Default values
    hadoop.tasklog.taskid=null
    hadoop.tasklog.iscleanup=false
    hadoop.tasklog.noKeepSplits=4
    hadoop.tasklog.totalLogFileSize=100
    hadoop.tasklog.purgeLogSplits=true
    hadoop.tasklog.logsRetainHours=12
    
    log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
    log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
    log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
    log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
    
    log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
    log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    
    #
    # HDFS block state change log from block manager
    #
    # Uncomment the following to suppress normal block state change
    # messages from BlockManager in NameNode.
    #log4j.logger.BlockStateChange=WARN
    
    #
    #Security appender
    #
    hadoop.security.logger=INFO,NullAppender
    hadoop.security.log.maxfilesize=256MB
    hadoop.security.log.maxbackupindex=20
    log4j.category.SecurityLogger=${hadoop.security.logger}
    hadoop.security.log.file=SecurityAuth-${user.name}.audit
    log4j.appender.RFAS=org.apache.log4j.RollingFileAppender 
    log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
    log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
    log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
    
    #
    # Daily Rolling Security appender
    #
    log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender 
    log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
    log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
    
    #
    # hadoop configuration logging
    #
    
    # Uncomment the following line to turn off configuration deprecation warnings.
    # log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
    
    #
    # hdfs audit logging
    #
    hdfs.audit.logger=INFO,NullAppender
    hdfs.audit.log.maxfilesize=256MB
    hdfs.audit.log.maxbackupindex=20
    log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
    log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
    log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
    log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
    log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
    log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
    log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
    
    #
    # mapred audit logging
    #
    mapred.audit.logger=INFO,NullAppender
    mapred.audit.log.maxfilesize=256MB
    mapred.audit.log.maxbackupindex=20
    log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
    log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
    log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
    log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
    log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
    log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
    log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
    
    # Custom Logging levels
    
    #log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
    #log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
    #log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
    
    # Jets3t library
    log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
    
    # AWS SDK & S3A File System
    log4j.logger.com.amazonaws=ERROR
    log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR
    log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN
    
    #
    # Event Counter Appender
    # Sends counts of logging messages at different severity levels to Hadoop Metrics.
    #
    log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
    
    #
    # Job Summary Appender 
    #
    # Use following logger to send summary to separate file defined by 
    # hadoop.mapreduce.jobsummary.log.file :
    # hadoop.mapreduce.jobsummary.logger=INFO,JSA
    # 
    hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
    hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
    hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
    hadoop.mapreduce.jobsummary.log.maxbackupindex=20
    log4j.appender.JSA=org.apache.log4j.RollingFileAppender
    log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
    log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
    log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
    log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
    log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
    log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
    
    #
    # Yarn ResourceManager Application Summary Log 
    #
    # Set the ResourceManager summary log filename
    yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
    # Set the ResourceManager summary log level and appender
    yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
    #yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
    
    # To enable AppSummaryLogging for the RM, 
    # set yarn.server.resourcemanager.appsummary.logger to 
    # ,RMSUMMARY in hadoop-env.sh
    
    # Appender for ResourceManager Application Summary Log
    # Requires the following properties to be set
    #    - hadoop.log.dir (Hadoop Log directory)
    #    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
    #    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
    
    log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
    log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
    log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
    log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
    log4j.appender.RMSUMMARY.MaxFileSize=256MB
    log4j.appender.RMSUMMARY.MaxBackupIndex=20
    log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
    log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    

    Last:

    nano /usr/local/hadoop/conf/mapred-site.xml

    And Append:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
    </property>
    </configuration>
    
  2. 9. Setting Up Environment

    Again SetUp Local Path & Environment
    Exit from SuperUser to the normal User:

    exit

    Then Change to the Home directory:

    cd 

    And Edit the bash Config file:

    nano .bashrc

    So Inserts:

    HADOOP_HOME=/usr/local/hadoop/nexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin/nexport JAVA_HOME=/usr/lib/jvm/<oracleJdkVersion>

    Replace <oracleJdkVersion> with the Correct JDK Release you can find running:

    update-alternatives --config java

    And for the Java Compiler instead:

    sudo update-alternatives --config javac

    But Skip this if already Made JAVA_HOME Setup during JDK Installation…
    Finally, to Load the New Setup simply:

    bash

Contents


QuickChic Theme • Powered by WordPress