Hadoop Developer Training Course


Description Days Price (ex vat)
Hadoop Developer Training 3 ZAR

R 18,500

  • Lunch, refreshments and training material included.
  • Class start at 9:00am for 9:30am
  • South Africa training locations: Johannesburg, Cape Town, Durban
  • Global training locations: USA, Candana, UK, Dubai, Europe

Introduction to Apache Hadoop and the Hadoop Ecosystem
• Apache Hadoop Overview
• Data Ingestion and Storage
• Data Processing
• Data Analysis and Exploration
• Other Ecosystem Tools
• Introduction to the Hands-On Exercises

Apache Hadoop File Storage
• Apache Hadoop Cluster Components
• HDFS Architecture
• Using HDFS

Distributed Processing on an Apache Hadoop Cluster
• YARN Architecture
• Working With YARN

Apache Spark Basics
• What is Apache Spark?
• Starting the Spark Shell
• Using the Spark Shell
• Getting Started with Datasets and DataFrames
• DataFrame Operations

Working with DataFrames and Schemas
• Creating DataFrames from Data Sources
• Saving DataFrames to Data Sources
• DataFrame Schemas
• Eager and Lazy Execution

Analyzing Data with DataFrame Queries
• Querying DataFra mes UsingColumn Expressions
• Grouping and Aggregation Queries
• Joining DataFrames

RDD Overview
• RDD Overview
• RDD Data Sources
• Creating and Saving RDDs
• RDD Operations

Transforming Data with RDDs
• Writing and Passing

Transformation Functions
• Transformation Execution
• Converting Between RDDs and DataFrames

Aggregating Data with Pair RDDs
• Key-Value Pair RDDs
• Map-Reduce
• Other Pair RDD Operations

Querying Tables and Views with Apache Spark SQL
• Querying Tables in Spark Using SQL
• Querying Files and Views
• The Catalog API
• Comparing Spark SQL, Apache Impala and Apache Hive-on-Spark

Working with Datasets in Scala
• Datasets and DataFrames
• Creating Datasets
• Loading and Saving Datasets
• Dataset Operations

Writing, Configuring, and Running
Apache Spark Applications
• Writing a Spark Application
• Building and Running an Application
• Application Deployment Mode
• The Spark Application Web UI
• Configuring Application Properties

Distributed Processing
• Review: Apache Spark on a Cluster
• RDD Partitions
• Example: Partitioning in Queries
• Stages and Tasks
• Job Execution Planning
• Example: Catalyst Execution Plan
• Example: RDD Execution Plan

Distributed Data Persistence
• DataFrame and Dataset Persistence
• Persistence Storage Levels
• Viewing Persisted RDDs

Common Patterns in Apache Spark
Data Processing
• Common Apache Spark Use Cases
• Iterative Algorithms in Apache Spark
• Machine Learning
• Example: k-means

Apache Spark Streaming:
Introduction to DStreams
• Apache Spark Streaming Overview
• Example: Streaming Request Count
• DStreams
• Developing Streaming Applications

Apache Spark Streaming:
Processing Multiple Batches
• Multi-Batch Operations
• Time Slicing
• State Operations
• Sliding Window Operations
• Preview: Structured Streaming

Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka

Data Sources
• Example: Using a Kafka Direct Data Source