Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Apache Spark with Python - Big Data with PySpark and Spark
Get Started with Apache Spark
Introduction to Spark (2:28)
Install Java and Git (8:53)
Git URL
Set up Spark (9:22)
Winutils URL
Run our first Spark job (3:34)
RDD
RDD Basics (2:50)
Create RDDs (2:32)
Spark Data Sources
Map and Filter Transformation (9:29)
Solution to Airports by Latitude Problem (1:57)
FlatMap Transformation (3:46)
Set Operations (8:26)
Sampling with Replacement and Sampling without Replacement
Solution for the Same Hosts Problem (1:54)
Actions (9:03)
Solution to Sum of Numbers Problem (2:06)
Important Aspects about RDD (1:40)
Summary of RDD Operations (2:26)
Caching and Persistance (5:16)
Spark Architecture and Components
Spark Components (5:26)
Spark Architecture (3:01)
Pair RDD
Introduction to Pair RDD (1:38)
Create Pair RDDs (4:15)
Filter and MapValue Transformations on Pair RDD (5:16)
Reduce By Key Aggregation (5:38)
Solution for the Average House Problem (3:24)
Group By Key Transformation (5:15)
Sort By Key Transformation (2:51)
Solution for the Sorted Word Count Problem (3:24)
Data Partitioning (4:18)
Join Operations (5:12)
Advanced Spark Topics
Solution to StackOverflow Survey Follow-up Problem (3:44)
Accumulators (1:05)
Broadcast Variables (6:46)
Spark SQL
Introduction to Spark SQL (3:56)
Spark SQL in Action (13:12)
Dataframe or RDD (7:03)
Spark SQL practice: House Price Problem (1:54)
Spark SQL Joins (2:54)
Dataframe and RDD Conversion (2:57)
Performance Tuning of Spark SQL (2:51)
Running Spark in a Cluster
Introduction to Running Spark in a Cluster (4:05)
Spark-submit (2:41)
Run Spark Application on Amazon EMR (ElasticMapReduce) cluster (15:10)
Extra Learning Material: Avoid These Mistakes While Writing Apache Spark Program
Introduction to Pair RDD
Complete and Continue