Big Data with Scala and Spark

(Instant booking on GulfTalent)
Location
Online
Dates
Can be taken anytime
Course Type
Professional Training Course
Accreditation
Yes (Details)
Language
English
Course Fee
$200 $60 only

Course Overview

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Learning Outcomes. By the end of this course you will be able to:

  • Read data from persistent storage and load it into Apache Spark,
  • Manipulate data with Spark and Scala,
  • Express algorithms for data analysis in a functional style,
  • Recognize how to avoid shuffles and recomputation in Spark,

Who should take this course

There is an increasing demand for skilled data scientists across all industries that make this course suitable for participants at all levels of experience. We recommend this data science training especially for the following professionals:

  • Graduates looking to build a career in Hadoop with scala, spark.
  • Analytics professionals who want to work with Big data Hadoop Functions with scala, spark
  • IT professionals looking for a career switch in the fields of Big data Hadoop with scala, spark
  • Software developers interested in pursuing a career in Big data Hadoop with scala, spark
  • Experienced professionals who would like to harness Big data Hadoop with scala, spark in their fields

Accreditation

Internationally Accepted Certificate

Course content

Duration of Course:

  • 40+ hours

Topics Covered are:

  • Introduction to Scala for Apache Spark

Topics:

  • What is Scala?
  • Why Scala for Spark?
  • Scala in other frameworks
  • Introduction to Scala REPL
  • Basic Scala operations
  • Variable Types in Scala
  • Control Structures in Scala
  • Foreach loop, Functions and Procedures
  • Collections in Scala- Array
  • ArrayBuffer, Map, Tuples, Lists, and more
  • OOPS and Functional Programming in Scala

Topics:

  • Class in Scala
  • Getters and Setters
  • Custom Getters and Setters
  • Properties with only Getters
  • Auxiliary Constructor and Primary Constructor
  • Singletons
  • Extending a Class
  • Overriding Methods
  • Traits as Interfaces and Layered Traits
  • Programming
  • Higher Order Functions
  • Anonymous Functions, and more
  • Introduction to Big Data and Hadoop

Topics:

  • What is Big Data?
  • Big Data Customer Scenarios
  • Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
  • How Hadoop Solves the Big Data Problem
  • What is Hadoop?
  • Hadoop's Key Characteristics
  • Hadoop Ecosystem and HDFS
  • Hadoop Core Components
  • Rack Awareness and Block Replication
  • HDFS Read/Write Mechanism
  • YARN and Its Advantage
  • Hadoop Cluster and Its Architecture
  • Hadoop: Different Cluster Modes
  • Data Loading using Sqoop
  • Apache Spark Framework

Topics:

  • Big Data Analytics with Batch & Real-Time Processing
  • Why Spark is Needed?
  • What is Spark?
  • How Spark Differs from Its Competitors?
  • Spark at eBay
  • Spark's Place in Hadoop Ecosystem
  • Spark Components & it's Architecture
  • Running Programs on Scala IDE & Spark Shell
  • Spark Web UI
  • Configuring Spark Properties
  • Playing with RDDs

Topics:

  • Challenges in Existing Computing Methods
  • Probable Solution & How RDD Solves the Problem
  • What is RDD, It's Functions, Transformations & Actions?
  • Data Loading and Saving Through RDDs
  • Key-Value Pair RDDs and Other Pair RDDs o RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How It Helps Achieve Parallelization
  • DataFrames and Spark SQL
  • Need for Spark SQL
  • What is Spark SQL?
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • Data Frames & Datasets
  • Interoperating with RDDs
  • JSON and Parquet File Formats
  • Loading Data through Different Sources
  • Machine Learning using Spark MLlib
  • What is Machine Learning?
  • Where is Machine Learning Used?
  • Different Types of Machine Learning Techniques
  • Face Detection: USE CASE
  • Understanding MLlib
  • Features of Saprk MLlib and MLlib Tools
  • Various ML algorithms supported by Spark MLlib
  • K-Means Clustering & How It Works with MLlib
  • Analysis on US Election Data: K-Means Spark MLlib USE CASE
  • Understanding Apache Kafka and Kafka Cluster
  • Need for Kafka
  • What is Kafka?
  • Core Concepts of Kafka
  • Kafka Architecture
  • Where is Kafka Used?
  • Understanding the Components of Kafka Cluster
  • Configuring Kafka Cluster
  • Producer and Consumer
  • Capturing Data with Apache Flume and Integration with Kafka
  • Need of Apache Flume
  • What is Apache Flume
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration
  • Integrating Apache Flume and Apache Kafka
  • Apache Spark Streaming
  • Drawbacks in Existing Computing Methods
  • Why Streaming is Necessary?
  • What is Spark Streaming?
  • Spark Streaming Features
  • Spark Streaming Workflow
  • How Uber Uses Streaming Data
  • Streaming Context & DStreams
  • Transformations on DStreams
  • WordCount Program using Spark Streaming
  • Describe Windowed Operators and Why it is Useful
  • Important Windowed Operators
  • Slice, Window and ReduceByWindow Operators
  • Stateful Operators
  • Perform Twitter Sentimental Analysis Using Spark Streaming

About Course Provider

Knowasap provides best online self learning SAP courses and high end technologies courses that maximizes learning outcomes and career opportunity for professionals and as well as students. Experienced consultants, project team members, support professionals, end users, executives and students will find courses to meet their needs that are accessible anytime, anywhere.

How to enroll?

You can book the course instantly by paying on GulfTalent.

(Instant booking on GulfTalent)

Frequently asked questions

{{ item.question }}