
Big Data Hadoop, Spark & Kafka
Duration: 15 Weeks
Audience: Business Analysts, IT Architects, Technical Managers and Developers
Suggested Prerequisites: Linux Basics, SQL
Course Outline:
Week 1:
- Introduction
- Data Security
- Cloud Infrastructure
- Big Data
- Project – Data Ingestion and Analysis using MySQL
Week 2:
- Big Data
- Hadoop HDFS
- Homework: HDFS Lab
Week 3:
- Hadoop MapReduce Framework
- Homework: Mapreduce Lab
- Class Test: Hadoop HDFS
Week 4:
- Cloudera Manager Installation
- Cloudera Hadoop Installation
- Project – Cloudera Hadoop Upgrade Process
Week 5:
- Hadoop Auto Provisioning
- Data Formats
- Ingesting Data in Hadoop using SQOOP and FLUME
- Homework: Sqoop and Flume Lab
- Class Test: Hadoop Mapreduce
Week 6:
- Data Analysis using Pig & HIVE
- Class Test: Data Ingestion
- Homework: HIVE Lab
Week 7:
- Python Basics
- Class Test: Hadoop Hive
- Homework: Python Lab
Week 8:
- Data Science – NumPy, Pandas, SciPy, MatPlotLib & Seaborn
- Class Test: Python
- Homework: Data Science Lab
Week 9:
- Apache Spark
- Resilient Distributed Datasets (RDDs)
- Class Test: Data Science
- Homework: Apache Spark Lab
Week 10:
- Apache Spark
- Spark DataFrames
- Class Test: Apache Spark
- Homework: RDD & Dataframes Lab
Week 11:
- Apache Kafka
- Stream Ingestion/Analytics
- Class Test: Apache Spark
- Homework: Stream Analytics using Kafka Lab
Week 12:
- Apache Spark Streaming
- Apache Kafka
- Project – Streaming Ingestion/Analytics using Kafka
Week 13:
- Apache Airflow
- Class Test: Apache Kafka
- Homework: Apache Airflow
Week 14:
- Apache NiFi
- Class Test: Apache Airflow
- Homework: Apache NiFi
Week 15:
- Data Engineering & Data Science on the Cloud, AWS, Azure and Google
- Final Exam