Call Us: +91-9000011489

Register for Free Demo

Introduction to BIGDATA and HADOOP

  • What is Big Data?
  • What is Hadoop?
  • Relation between Big Data and Hadoop.
  • What is the need of going ahead with Hadoop?
  • Scenarios to apt Hadoop Technology in REAL TIME Projects
  • Challenges with Big Data
  • Storage
  • Processing
  • How Hadoop is addressing Big Data Changes
  • Comparison with Other Technologies
  • Different Components of Hadoop Echo System
  • Storage Components
  • Processing Components
  • Importance of Hadoop Echo System Components

HDFS (Hadoop Distributed File System)

  • What is a Cluster Environment?
  • Cluster Vs Hadoop Cluster.
  • Significance of HDFS in Hadoop
  • Features of HDFS
  • Storage aspects of HDFS
  • Block
  • How to Configure block size?
  • Default Vs Configurable Block size
  • Why HDFS Block size so large?
  • Design Principles of Block Size

HDFS Architecture - 5 Daemons of Hadoop

  • NameNode and its functionality
  • DataNode and its functionality
  • JobTracker and its functionality
  • TaskTrack and its functionality
  • Secondary Name Node and its functionality.

Replication in Hadoop – Fail Over Mechanism

  • Data Storage in Data Nodes
  • Fail Over Mechanism in Hadoop – Replication
  • Replication Configuration
  • Custom Replication
  • Design Constraints with Replication Factor Can we change the replication factor in
  • Hadoop?
  • Can we change the block size for a file or directory in Hadoop?
  • Accessing HDFS
  • CLI (Command Line Interface) and HDFS Commands
  • Configuration files in Hadoop Installation and the Purpose
  • How to & Where to Configure Hadoop Daemons in a Hadoop Cluster?
  • Name Node HA (High Availability in Hadoop 2.X.X)

MapReduce

  • Why Map Reduce is essential in Hadoop?
  • Processing Daemons of Hadoop
  • Job Tracker
  • Roles of Job Tracker
  • Drawbacks w.r.to Job Tracker failure in Hadoop Cluster
  • How to configure Job Tracker in Hadoop Cluster?
  • Task Tracker
  • Roles of Task Tracker
  • Drawbacks w.r.to Task Tracker Failure in Hadoop Cluster

Input Split

  • Input Split
  • Need of Input Split in Map Reduce
  • Input Split Size
  • Input Split Size Vs Block Size
  • Input Split Vs Mappers

Map Reduce Life Cycle

  • Communication Mechanism of Job Tracker & Task Tracker
  • Input Format Class
  • Record Reader Class
  • Success Case Scenarios
  • Failure Case Scenarios
  • Retry Mechanism in Map Reduce
  • Map Reduce Programming Model
  • Different phases of Map Reduce Algorithm
  • Different Data types in Map Reduce
  • Primitive Data Types Vs Map Reduce Data types
  • How to write a basic Map Reduce Program?
  • Driver Code
  • Mapper Code
  • Reducer Code
  • Driver Code
  • Importance of Driver Code in a Map Reduce program
  • How to Identify the Driver Code in Map Reduce program?
  • Different sections of Driver code
  • Mapper Code
  • Importance of Mapper Phase in Map Reduce
  • How to Write a Mapper Class?
  • Methods in Mapper Class
  • Reducer Code
  • Importance of Reduce phase in Map Reduce
  • How to Write Reducer Class?
  • Methods in Reducer Class

IDENTITY MAPPER & IDENTITY REDUCER

Input Format’s in Map Reduce

  • TextInputFormat
  • KeyValueTextInputFormat
  • NLineInputFormat
  • DBInputFormat
  • SequenceFileInputFormat.
  • How to use the specific input format in Map Reduce?
  • How to write Custom Input Format Class and Custom Record Reader

Output Format’s in Map Reduce

  • TextOutputFormat
  • KeyValueTextOutputFormat
  • NLineOutputFormat
  • DBOutputFormat
  • SequenceFileOutputFormat.
  • How to use the specific Output format in Map Reduce?
  • How to write Custom Output Format Class and Custom Record Writer
  • Map Reduce API (Application Programming Interface)
  • New API
  • Deprecated API
  • Combiner in Map Reduce
  • Is combiner mandate in Map Reduce
  • How to use the combiner class in Map Reduce?
  • Performance tradeoffs w.r.to Combiner
  • Real Time Use Cases
  • Where to Use & Where Not to Use Combiner

Apache PIG

  • Introduction to Apache Pig
  • Map Reduce Vs Apache Pig
  • SQL Vs Apache Pig
  • Different datatypes in Pig
  • Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
  • Modes of Execution in Pig
  • Local Mode
  • Map Reduce OR Distributed Mode
  • Execution Mechanism
  • Grunt Shell
  • Script
  • Transformations in Pig
  • How to write a simple pig script?
  • How to develop the Complex Pig Script?
  • Bags, Tuples and fields in PIG
  • UDFs in Pig
  • Need of using UDFs in PIG
  • How to use UDFs
  • REGISTER Key word in PIG

HIVE

  • Hive Introduction
  • Need of Apache HIVE in Hadoop
  • When to choose PIG & HIVE in REAL Time Project
  • Hive Architecture
  • Driver
  • Compiler
  • Executor (Semantic Analyzer)
  • Meta Store in Hive
  • Importance of Hive Meta Store
  • Embedded metastore configuration
  • External metastore configuration
  • Communication mechanism with Metastore
  • Hive Integration with Hadoop
  • Hive Query Language (Hive QL)
  • SQL VS Hive QL
  • Data Slicing Mechanisms
  • Partitions in Hive
  • Buckets in Hive
  • Partitioning Vs Bucketing
  • Real Time Use Cases
  • User Defined Functions(UDFs) in HIVE
  • UDFs
  • UDAFs
  • UDTFs
  • Need of UDFs in HIVE
  • HIVE – HBASE Integration

SQOOP

  • Introduction to Sqoop.
  • MySQL client and Server Installation
  • How to connect to Relational Database using Sqoop
  • Different Sqoop Commands
  • Different flavors of Imports
  • Export
  • Hive-Imports

Hbase

  • Hbase introduction
  • HDFS Vs Hbase
  • Hbase Vs RDBMS
  • Hbase Vs NO SQL
  • Hbase usecases
  • Hbase Data modeling Elements
  • Column families
  • Column Qualifier Name
  • Row Key
  • Hbase Architecture
  • Clients
  • REST
  • Thrift
  • Java Based
  • Avro
  • Map Reduce Integration
  • Map Reduce over Hbase
  • Hbase Admin
  • Schema Definition
  • Basic CRUD Operations
  • Client Side Buffering in Hbase

Hadoop Administration

Hadoop Single Node Cluster Set Up (Hands on Installation on Laptops)

  • Operating System Installation
  • JDK Installation
  • SSH Configuration.
  • Dedicated Group & User Creation
  • Hadoop Installation
  • Different Configuration Files Setting
  • Name node format
  • Starting the Hadoop Daemons

Multi Node Hadoop Cluster Set Up (Hands on Installation on Laptops)

  • Network related settings
  • Hosts Configuration
  • Password less SSH Communication
  • Hadoop Installation
  • Configuration Files Setting
  • Name Node Format
  • Starting the Hadoop Daemons

PIG Installation (Hands on Installation on Laptops)

  • Local Mode
  • Clustered Mode
  • Bashrc file configuration

SQOOP Installation (Hands on Installation on Laptops)

  • Sqoop installation with MySQL Client

HIVE Installation (Hands on Installation on Laptops)

  • Local Mode
  • Clustered Mode