From 0 to 1 : Spark for Data Science with Python

Location
Online
Dates
Can be taken anytime
Course Type
Professional Training Course
Accreditation
Yes (Details)
Language
English
Price
$10

Course Overview

Taught by a 4 person team including 2 Stanford-educated ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data.

Get your data to fly using Spark for analytics machine learning and data science

Let's parse that.

What's Spark? If you are an analyst or a data scientist you're used to having multiple systems for working with data. SQL Python R Java etc. With Spark you have a single engine where you can explore and play with large amounts of data run machine learning algorithms and then use the same system to productionize your code.

Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease.

Machine Learning and Data Science : Spark's core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We'll cover a variety of datasets and algorithms including PageRank MapReduce and Graph datasets.

What's Covered:

Lot's of cool stuff

  • Music Recommendations using Alternating Least Squares and the Audioscrobbler dataset
  • Dataframes and Spark SQL to work with Twitter data
  • Using the PageRank algorithm with Google web graph dataset
  • Using Spark Streaming for stream processing
  • Working with graph data using the Marvel Social network dataset

.. and of course all the Spark basic and advanced features:

  • Resilient Distributed Datasets Transformations (map filter flatMap) Actions (reduce aggregate)
  • Pair RDDs reduceByKey combineByKey
  • Broadcast and Accumulator variables
  • Spark for MapReduce
  • The Java API for Spark
  • Spark SQL Spark Streaming MLlib and GraphFrames (GraphX for Python)

Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately much as we would like to it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to NOT offer additional technical support over email or in-person. The truth is direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!

BASIC KNOWLEDGE

  • The course assumes knowledge of Python. You can write Python code directly in the PySpark shell. If you already have IPython Notebook installed we'll show you how to configure it for Spark
  • For the Java section we assume basic knowledge of Java. An IDE which supports Maven like IntelliJ IDEA/Eclipse would be helpful
  • All examples work with or without Hadoop. If you would like to use Spark with Hadoop you'll need to have Hadoop installed (either in pseudo-distributed or cluster mode).

Who should take this course

Who is the target audience?

  • Yep! Analysts who want to leverage Spark for analyzing interesting datasets
  • Yep! Data Scientists who want a single engine for analyzing and modelling data as well as productionizing it.
  • Yep! Engineers who want to use a distributed computing engine for batch or stream processing or both

Accreditation

Course Completion Certificate

Course content

What you will learn:

  • Use Spark for a variety of analytics and Machine Learning tasks
  • Implement complex algorithms like PageRank or Music Recommendations
  • Work with a variety of datasets from Airline delays to Twitter Web graphs Social networks and Product Ratings
  • Use all the different features and libraries of Spark : RDDs Dataframes Spark SQL MLlib Spark Streaming and GraphX

Curriculum:

You This Course and Us:

  • Introduction to Spark
  • Resilient Distributed Datasets
  • Advanced RDDs: Pair Resilient Distributed Datasets
  • Advanced Spark: Accumulators Spark Submit MapReduce Behind The Scenes
  • Java and Spark
  • PageRank: Ranking Search Results
  • Spark SQL
  • MLlib in Spark: Build a recommendations engine
  • Spark Streaming
  • Graph Libraries

About Course Provider

Simpliv LLC, a platform for learning and teaching online courses. We basically focus on online learning which helps to learn business concepts, software technology to develop personal and professional goals through video library by recognized industry experts or trainers.

Why Simpliv

With the ever-evolving industry trends, there is a constant need of the professionally designed learning solutions that deliver key innovations on time and on a budget to achieve long-term success.

Simpliv understands the changing needs and allows the global learners to evaluate their technical abilities by aligning the learnings to key business objectives in order to fill the skills gaps that exist in the various business areas including IT, Marketing, Business Development, and much more.

Frequently asked questions

{{ item.question }}