Chapter 1: Introduction to PySparkSQL
Chapter Goal: Reader will understand about PySpark, PySparkSQL , Catalyst Optimizer, Project Tungsten and Hive
No of pages 20-30
Sub -Topics
1. PySpark
2. PySparkSQL
3. Hive
4. Catalyst
5. Project Tungsten
Chapter 2: Some time with Installation
Chapter Goal: Learner will understand about installation of Spark, Hive, PostgreSQL, MySQL, MongoDB, Cassandra etc.
No of pages: 30 -40
Sub - Topics
1. Installation Spark
2. Installation Hive
3. Installation MySQL
4. Installation MongoDB
Chapter 3: IO in PySparkSQL
Chapter Goal: This chapter will provide recipes to the reader, which will enable them to create PySparkSQL DataFrame from different sources.
No of pages : 40-50
Sub - Topics:
1. Creating DataFrame from data.
2. Reading csv file to create Dataframe
3. Reading JSON file to create Dataframe.
4. Saving DataFrames to different formats.
Chapter 4 : Operations on PySparkSQL DataFrames
Chapter Goal: Reader will learn about data filtering, data manuipulation, data descriptive analysis , Dealing with missing value etc
No Of Pages ; 40 -50
1. Data filtering
2. Data manipulation
3. Row and column manipulation
Chapter 5 : Data Merging and Data Aggregation using PySparkSQL
Chapter Goal: Reader will learn about data merging and aggregation using PySparkSQL
1. Data Merging
2. Data aggregation
Chapter 6: SQL, NoSQL and PySparkSQL
Chapter Goal: Reader will learn to run SQL and HiveQL queries on Dataframe
No of pages: 30-40
Sub - Topics:
1. Running SQL on DataFrame
2. Running HiveQL
Chapter 7: Structured Streaming
Chapter Goal: Reader will understand about structured streaming
No of pages : 30-40
1. Different type of modes.
2. Data aggregation in structured streaming
3. Different type of sources
Chapter 8 : Optimizing PySparkSQL
Chapter Goal: Reader will learn about optimizing PySparkSQL
No Of pages : 20-30
Optimizing PySparkSQL
Chapter 9 : GraphFrames
Chapter Goal: Reader will understand about graph data analysis with Graphframes.
No of pages : 30-40
1. GraphFrame Creation
1. Page Rank
2. Breadth First Search