Big Data Hadoop, Spark & Kafka

Duration: 15 Weeks

Audience: Business Analysts, IT Architects, Technical Managers and Developers

Suggested Prerequisites: Linux Basics, SQL

Course Outline:

Week 1:

  • Introduction
  • Data Security
  • Cloud Infrastructure
  • Big Data
  • Project – Data Ingestion and Analysis using MySQL

Week 2:

  • Big Data
  • Hadoop HDFS
  • Homework: HDFS Lab

Week 3:

  • Hadoop MapReduce Framework
  • Homework: Mapreduce Lab
  • Class Test: Hadoop HDFS

Week 4:

  • Cloudera Manager Installation
  • Cloudera Hadoop Installation
  • Project – Cloudera Hadoop Upgrade Process

Week 5:

  • Hadoop Auto Provisioning
  • Data Formats
  • Ingesting Data in Hadoop using SQOOP and FLUME
  • Homework: Sqoop and Flume Lab
  • Class Test: Hadoop Mapreduce

Week 6:

  • Data Analysis using Pig & HIVE 
  • Class Test: Data Ingestion
  • Homework: HIVE Lab

Week 7:

  • Python Basics
  • Class Test: Hadoop Hive
  • Homework: Python Lab

Week 8:

  • Data Science – NumPy, Pandas, SciPy, MatPlotLib & Seaborn
  • Class Test: Python
  • Homework: Data Science Lab

Week 9:

  • Apache Spark
  • Resilient Distributed Datasets (RDDs)
  • Class Test: Data Science
  • Homework: Apache Spark Lab

Week 10:

  • Apache Spark
  • Spark DataFrames
  • Class Test: Apache Spark
  • Homework: RDD & Dataframes Lab

Week 11:

  • Apache Kafka
  • Stream Ingestion/Analytics
  • Class Test: Apache Spark
  • Homework: Stream Analytics using Kafka Lab

Week 12:

  • Apache Spark Streaming
  • Apache Kafka
  • Project – Streaming Ingestion/Analytics using Kafka

Week 13:

  • Apache Airflow
  • Class Test: Apache Kafka
  • Homework: Apache Airflow

Week 14:

  • Apache NiFi
  • Class Test: Apache Airflow
  • Homework: Apache NiFi

Week 15:

  • Data Engineering & Data Science on the Cloud, AWS, Azure and Google
  • Final Exam