All my research about Apache Spark and the Associate Level Certification that will come through this process.
We will be reading the book "Spark - The definitive guide" for this certification. Here is a high level overview -
Table of Contents
Part I. Gentle Overview Of Big Data And Spark Chapter 1. What Is Apache Spark? Chapter 2. A Gentle Introduction To Spark Chapter 3. A Tour Of Spark’S Toolset
Part II. Structured Apis—Dataframes, Sql, And Datasets Chapter 4. Structured Api Overview Chapter 5. Basic Structured Operations Chapter 6. Working With Different Types Of Data Chapter 7. Aggregations Chapter 8. Joins Chapter 9. Data Sources Chapter 10. Spark Sql Chapter 11. Datasets
Part III. Low-Level Apis Chapter 12. Resilient Distributed Datasets (Rdds) Chapter 13. Advanced Rdds Chapter 14. Distributed Shared Variables Chapter Iv. Production Applications Chapter 15. How Spark Runs On A Cluster Chapter 16. Developing Spark Applications Chapter 17. Deploying Spark Chapter 18. Monitoring And Debugging Chapter 19. Performance Tuning
Part V. Streaming Chapter 20. Stream Processing Fundamentals Chapter 21. Structured Streaming Basics Chapter 22. Event-Time And Stateful Processing Chapter 23. Structured Streaming In Production
Part VI. Advanced Analytics And Machine Learning Chapter 24. Advanced Analytics And Machine Learning Overview Chapter 25. Preprocessing And Feature Engineering Chapter 26. Classification Chapter 27. Regression Chapter 28. Recommendation Chapter 29. Unsupervised Learning Chapter 30. Graph Analytics Chapter 31. Deep Learning
Part VII. Ecosystem Chapter 32. Language Specifics: Python (Pyspark) And R (Sparkr And Sparklyr) Chapter 33. Ecosystem And Community