Big Data: Apache Spark and Python

Location
Online
Dates
Can be taken anytime
Course Type
Professional Training Course
Accreditation
Yes (Details)
Language
English
Price
$10

Course Overview

This course will help you learn the in depth concepts of Sparks Resilient Disturbed Datastores, develop and sun the Spark jobs quickly with Python. By the end of this course, you may expect to understand scaling up to larger data sets using Amazon's Elastic MapReduce services and understand how Hadoop YARN distributes Spark across computing clusters.

Coupon code - WIISEGT

Who should take this course

It is available for all the learners.

Accreditation

WIISE

Course content

The outline of this course is mentioned below:

Getting Started with Spark:

  • Introduction
  • How to Use This Course
  • Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies
  • Installing the MovieLens Movie Rating Dataset
  • Run your first Spark program! Ratings histogram example.

Examples: Spark Basics:

  • Introduction to Spark
  • The Resilient Distributed Dataset (RDD)
  • Ratings Histogram Walkthrough
  • Key/Value RDD's
  • Running the Average
  • Filtering RDD's
  • Running the Minimum Temperature
  • Running the Maximum Temperature
  • Counting Word Occurrences using flatmap
  • Improving the Word Count Script with Regular Expressions
  • Sorting the Word Count Results
  • Customer Order Assignments
  • Customer Order Solutions
  • Customer Order Sorted

Advanced Examples: Spark Programs:

  • Find the Most Popular Movie
  • Use Broadcast Variables to Display Movie Names Instead of ID Numbers
  • Find the Most Popular Superhero in a Social Graph
  • Run the Script
  • Superhero Degrees of Separation: Introduction
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
  • Superhero Degrees of Separation: Review the Code and Run it
  • Item-Based Collaborative Filtering in Spark, cache and persist
  • Running the Similar Movies Script using Spark's Cluster Manager
  • Improve the Quality of Similar Movies

Running Spark on a Cluster:

  • Introducing Elastic MapReduce
  • Setting up your AWS
  • Partitioning
  • Create Similar Movies from One Million Ratings - Part 1
  • Create Similar Movies from One Million Ratings - Part 2
  • Create Similar Movies from One Million Ratings - Part 3
  • Troubleshooting Spark on a Cluster
  • More Troubleshooting, and Managing Dependencies

SparkSQL, DataFrames and DataSets:

  • Introducing SparkSQL
  • Executing SQL commands and SQL
  • Using DataFrames instead of RDD's

Other Spark Technologies and Libraries:

  • Introducing MLLib
  • Using MLLib to Produce Movie Recommendations
  • Analyzing the ALS Recommendations Results
  • Using DataFrames with MLLib
  • Spark Streaming and GraphX

Future Steps:

  • Learning More about Spark and Data Science

About Course Provider

WIISE is a 'Professional Learning Network'​ with a global outreach that helps anyone to learn anything to achieve personal and professional goals.

We bring top-rated interactive learning courses & certifications from across the world through respected Global Academic Institutes and Industry experts to our learners.

WIISE for Teams is a Smart training solution suitable for growing businesses (SMB’s) - deliver online cost-effective, on-demand training, staff engagement & Upskilling to their employees and customers. WIISE incorporates the latest micro-learning & social-learning techniques that provides fast and engaging training at a fraction of cost of traditional training methods.

WIISE is brought by respectable Learning services & Skill development company - PositiveShift Group - Silicon Valley CA USA, India (www.positiveshift.in). The company has been awarded unique Innovation partnership with National Skill Development Corporation (NSDC) and Ministry of Skill Development and Entrepreneurship, Govt of India.

Frequently asked questions

{{ item.question }}