Hadoop Training Institute in KPHB Kukatpally Hyderabad

Course Content

Introduction to BIGDATA and HADOOP

What is Big Data?
What is Hadoop?
Relation between Big Data and Hadoop.
What is the need of going ahead with Hadoop?
Scenarios to apt Hadoop Technology in REAL TIME Projects
Challenges with Big Data
Storage
Processing
How Hadoop is addressing Big Data Changes
Comparison with Other Technologies
Different Components of Hadoop Echo System
Storage Components
Processing Components
Importance of Hadoop Echo System Components

HDFS (Hadoop Distributed File System)

What is a Cluster Environment?
Cluster Vs Hadoop Cluster.
Significance of HDFS in Hadoop
Features of HDFS
Storage aspects of HDFS
Block
How to Configure block size?
Default Vs Configurable Block size
Why HDFS Block size so large?
Design Principles of Block Size

HDFS Architecture - 5 Daemons of Hadoop

NameNode and its functionality
DataNode and its functionality
JobTracker and its functionality
TaskTrack and its functionality
Secondary Name Node and its functionality.

Replication in Hadoop – Fail Over Mechanism

Data Storage in Data Nodes
Fail Over Mechanism in Hadoop – Replication
Replication Configuration
Custom Replication
Design Constraints with Replication Factor Can we change the replication factor in
Hadoop?
Can we change the block size for a file or directory in Hadoop?
Accessing HDFS
CLI (Command Line Interface) and HDFS Commands
Configuration files in Hadoop Installation and the Purpose
How to & Where to Configure Hadoop Daemons in a Hadoop Cluster?
Name Node HA (High Availability in Hadoop 2.X.X)

MapReduce

Why Map Reduce is essential in Hadoop?
Processing Daemons of Hadoop
Job Tracker
Roles of Job Tracker
Drawbacks w.r.to Job Tracker failure in Hadoop Cluster
How to configure Job Tracker in Hadoop Cluster?
Task Tracker
Roles of Task Tracker
Drawbacks w.r.to Task Tracker Failure in Hadoop Cluster

Input Split

Input Split
Need of Input Split in Map Reduce
Input Split Size
Input Split Size Vs Block Size
Input Split Vs Mappers

Map Reduce Life Cycle

Communication Mechanism of Job Tracker & Task Tracker
Input Format Class
Record Reader Class
Success Case Scenarios
Failure Case Scenarios
Retry Mechanism in Map Reduce
Map Reduce Programming Model
Different phases of Map Reduce Algorithm
Different Data types in Map Reduce
Primitive Data Types Vs Map Reduce Data types
How to write a basic Map Reduce Program?
Driver Code
Mapper Code
Reducer Code
Driver Code
Importance of Driver Code in a Map Reduce program
How to Identify the Driver Code in Map Reduce program?
Different sections of Driver code
Mapper Code
Importance of Mapper Phase in Map Reduce
How to Write a Mapper Class?
Methods in Mapper Class
Reducer Code
Importance of Reduce phase in Map Reduce
How to Write Reducer Class?
Methods in Reducer Class

IDENTITY MAPPER & IDENTITY REDUCER

Input Format’s in Map Reduce

TextInputFormat
KeyValueTextInputFormat
NLineInputFormat
DBInputFormat
SequenceFileInputFormat.
How to use the specific input format in Map Reduce?
How to write Custom Input Format Class and Custom Record Reader

Output Format’s in Map Reduce

TextOutputFormat
KeyValueTextOutputFormat
NLineOutputFormat
DBOutputFormat
SequenceFileOutputFormat.
How to use the specific Output format in Map Reduce?
How to write Custom Output Format Class and Custom Record Writer
Map Reduce API (Application Programming Interface)
New API
Deprecated API
Combiner in Map Reduce
Is combiner mandate in Map Reduce
How to use the combiner class in Map Reduce?
Performance tradeoffs w.r.to Combiner
Real Time Use Cases
Where to Use & Where Not to Use Combiner

Apache PIG

Introduction to Apache Pig
Map Reduce Vs Apache Pig
SQL Vs Apache Pig
Different datatypes in Pig
Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
Modes of Execution in Pig
Local Mode
Map Reduce OR Distributed Mode
Execution Mechanism
Grunt Shell
Script
Transformations in Pig
How to write a simple pig script?
How to develop the Complex Pig Script?
Bags, Tuples and fields in PIG
UDFs in Pig
Need of using UDFs in PIG
How to use UDFs
REGISTER Key word in PIG

HIVE

Hive Introduction
Need of Apache HIVE in Hadoop
When to choose PIG & HIVE in REAL Time Project
Hive Architecture
Driver
Compiler
Executor (Semantic Analyzer)
Meta Store in Hive
Importance of Hive Meta Store
Embedded metastore configuration
External metastore configuration
Communication mechanism with Metastore
Hive Integration with Hadoop
Hive Query Language (Hive QL)
SQL VS Hive QL
Data Slicing Mechanisms
Partitions in Hive
Buckets in Hive
Partitioning Vs Bucketing
Real Time Use Cases
User Defined Functions(UDFs) in HIVE
UDFs
UDAFs
UDTFs
Need of UDFs in HIVE
HIVE – HBASE Integration

SQOOP

Introduction to Sqoop.
MySQL client and Server Installation
How to connect to Relational Database using Sqoop
Different Sqoop Commands
Different flavors of Imports
Export
Hive-Imports

Hbase

Hbase introduction
HDFS Vs Hbase
Hbase Vs RDBMS
Hbase Vs NO SQL
Hbase usecases
Hbase Data modeling Elements
Column families
Column Qualifier Name
Row Key
Hbase Architecture
Clients
REST
Thrift
Java Based
Avro
Map Reduce Integration
Map Reduce over Hbase
Hbase Admin
Schema Definition
Basic CRUD Operations
Client Side Buffering in Hbase

Hadoop Administration

Hadoop Single Node Cluster Set Up (Hands on Installation on Laptops)

Operating System Installation
JDK Installation
SSH Configuration.
Dedicated Group & User Creation
Hadoop Installation
Different Configuration Files Setting
Name node format
Starting the Hadoop Daemons

Multi Node Hadoop Cluster Set Up (Hands on Installation on Laptops)

Network related settings
Hosts Configuration
Password less SSH Communication
Hadoop Installation
Configuration Files Setting
Name Node Format
Starting the Hadoop Daemons

PIG Installation (Hands on Installation on Laptops)

Local Mode
Clustered Mode
Bashrc file configuration

SQOOP Installation (Hands on Installation on Laptops)

Sqoop installation with MySQL Client

HIVE Installation (Hands on Installation on Laptops)

Local Mode
Clustered Mode

Register for Free Demo

Introduction to BIGDATA and HADOOP

HDFS (Hadoop Distributed File System)

HDFS Architecture - 5 Daemons of Hadoop

Replication in Hadoop – Fail Over Mechanism

MapReduce

Input Split

Map Reduce Life Cycle

IDENTITY MAPPER & IDENTITY REDUCER

Input Format’s in Map Reduce

Output Format’s in Map Reduce

Apache PIG

HIVE

SQOOP

Hbase

Hadoop Administration

Hadoop Single Node Cluster Set Up (Hands on Installation on Laptops)

Multi Node Hadoop Cluster Set Up (Hands on Installation on Laptops)

PIG Installation (Hands on Installation on Laptops)

SQOOP Installation (Hands on Installation on Laptops)

HIVE Installation (Hands on Installation on Laptops)