Big Data HADOOP

(Instant booking on GulfTalent)
Location
Online
Dates
Can be taken anytime
Course Type
Professional Training Course
Accreditation
Yes (Details)
Language
English
Price
$200 $60 only

Course Overview

Big Data Hadoop training will make you an expert in HDFS, MapReduce, Hbase, Hive, Pig, Yarn, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain. You will get Eckovation's Hadoop certification at the end of the course.

According to Forbes Big Data & Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015. McKinsey predicts that by 2018 there will be a shortage of 1.5M data experts. According to Indeed Salary Data, the Average salary of Big Data Hadoop Developers is $135k. Once you complete the course and all the assignments you will be granted a soft and hard copy of completion certificate.

Who should take this course

There is an increasing demand for skilled data scientists across all industries that make this course suitable for participants at all levels of experience. We recommend this data science training especially for the following professionals:

  • Graduates looking to build a career in Hadoop
  • Analytics professionals who want to work with Big data Hadoop Functions
  • IT professionals looking for a career switch in the fields of Big data Hadoop
  • Software developers interested in pursuing a career in Big data Hadoop
  • Experienced professionals who would like to harness Big data Hadoop in their fields

Accreditation

Internationally Accepted Certificate

Course content

BIG DATA HADOOP Duration of Course:

  • 40+ hours

BIG DATA HADOOP Topics Covered are:

Session 1 - Introduction to Big Data:

  • Importance of Data
  • ESG Report on Analytics
  • Big Data & It's Hype
  • What is Big Data?
  • Structured vs Unstructured data
  • Definition of Big Data
  • Big Data Users & Scenarios
  • Challenges of Big Data
  • Why Distributed Processing?

Session 2 - Hadoop:

  • History Of Hadoop
  • Hadoop Ecosystem
  • Hadoop Animal Planet
  • When to use & when not to use Hadoop
  • What is Hadoop?
  • Key Distinctions of Hadoop
  • Hadoop Components/Architecture
  • Understanding Storage Components
  • Understanding Processing Components
  • Anatomy Of a File Write
  • Anatomy of a File Read

Session 3 - Understanding Hadoop Cluster:

  • Handout discussion
  • Walkthrough of CDH setup
  • Hadoop Cluster Modes
  • Hadoop Configuration files
  • Understanding Hadoop Cluster configuration
  • Data Ingestion to HDFS

Session 4 - MapReduce:

  • Meet MapReduce
  • Word Count Algorithm - Traditional approach
  • Traditional approach on a Distributed system
  • Traditional approach - Drawbacks
  • MapReduce approach
  • Input & Output Forms of a MR program
  • Map, Shuffle & Sort, Reduce Phases
  • Workflow & Transformation of Data
  • Word Count Code walkthrough

Session 5 - MapReduce:

  • Input Split & HDFS Block
  • Relation between Split & Block
  • MR Flow with Single Reduce Task
  • MR flow with multiple Reducers
  • Data locality Optimization
  • Speculative Execution

Session 6 - Advanced MapReduce:

  • Combiner
  • Partitioner
  • Counters
  • Hadoop Data Types
  • Custom Data Types
  • Input Format & Hierarchy
  • Output Format & Hierarchy
  • Side Data distribution - Distributed cache

Session 7 - Advanced MapReduce:

  • Joins
  • Map side Join using Distributed cache
  • Reduce side Join
  • MR Unit - An Unit testing framework

Session 8 - Pig:

  • What is Pig?
  • Why Pig?
  • Pig vs Sql
  • Execution Types or Modes
  • Running Pig
  • Pig Data types
  • Pig Latin relational Operators
  • Multi Query execution
  • Pig Latin Diagnostic Operators

Session 9 - Pig:

  • Pig Latin Macro & UDF statements
  • Pig Latin Commands
  • Pig Latin Expressions
  • Schemas
  • Pig Functions
  • Pig Latin File Loaders
  • Pig UDF & executing a Pig UDF

Session 10 - Hive:

  • Introduction to Hive
  • Pig Vs Hive
  • Hive Limitations & Possibilities
  • Hive Architecture
  • Metastore
  • Hive Data Organization
  • Hive QL
  • Sql vs Hive QL
  • Hive Data types
  • Data Storage
  • Managed & External Tables

Session 11 - Hive:

  • Partitions & Buckets
  • Storage Formats
  • Built-in Serdes
  • Importing Data
  • Alter & Drop Commands
  • Data Querying

Session 12 - Hive:

  • Using MR Scripts
  • Hive Joins
  • Sub Queries
  • Views
  • UDFs

Session 13 - HBase:

  • Introduction to NoSql & HBase
  • Row & Column oriented storage
  • Characteristics of a huge DB
  • What is HBase?
  • HBase Data-Model
  • HBase vs RDBMS
  • HBase architecture
  • HBase in operation
  • Loading Data into HBase
  • HBase shell commands
  • HBase operations through Java
  • HBase operations through MR

Session 14 - ZooKeeper & Oozie:

  • Introduction to Zookeeper
  • Distributed Coordination
  • Zookeeper Data Model
  • Zookeeper Service
  • Zookeeper in HBase
  • Introduction to Oozie
  • Oozie workflow

Session 15 - Sqoop:

  • Introduction to Sqoop
  • Sqoop design
  • Sqoop Commands
  • Sqoop Import & Export Commands
  • Sqoop Incremental load Commands

Session 16 - Hadoop 2.0 & YARN:

  • Hadoop 1 Limitations
  • HDFS Federation
  • NameNode High Availability
  • Introduction to YARN
  • YARN Applications
  • YARN Architecture
  • Anatomy of an YARN application

About Course Provider

Knowasap provides best online self learning SAP courses and high end technologies courses that maximizes learning outcomes and career opportunity for professionals and as well as students. Experienced consultants, project team members, support professionals, end users, executives and students will find courses to meet their needs that are accessible anytime, anywhere.

How to enroll?

You can book the course instantly by paying on GulfTalent.

(Instant booking on GulfTalent)

Frequently asked questions

{{ item.question }}