Clientele ➞

Machine Learning with Apache Spark

Apache Spark

Duration: 2 Days


This course is targeted at developers who want to step into the world of Machine Learning with Spark. Machine Learning is relevant in numerous fields and applications and its usage is constantly on the rise and is useful for a multitude of applications. These applications require a large amount of data processing quickly and efficiently. This course covers basic features of Spark Machine Learning such as the features of MLlib library, basic Machine Learning concepts such as statistics, classification and regression models to advanced concepts such as feature extraction and transformation. Utilizing Spark MLlib library, developers have a much larger range of flexibility with its simplicity, streamlining, compatibility and scalability. Additionally, the Spark MLlib library is used in advertising optimization, fraud detection and supply chain management prominently and is one of the most powerful tools used by Data Scientists for Machine Learning. This course details the aspects of...Read more


  • Learn the fundamentals of Machine Learning
  • Learn the fundamentals of Spark Machine Learning and correspondent libraries
  • Learn the difference between Spark and TensorFlow
  • Develop Spark applications using Spark MLlib
  • Understand features of Spark MLlib
  • Optimize Spark Machine Learning applications

Who Should Attend

  • Developers who are working or are expected to work on Big Data and Analytics
  • Developers who are looking to get an insight into Machine Learning with Spark

Course Outline

Day 1

  1. Introduction to Spark Machine Learning
    • Spark ML Library
    • Dependencies of MLlib
    • Features of MLlib
    • Spark vs TensorFlow
    • Data Types
  2. Basic Statistical Operations
    • Statistical Summary
    • Correlations
    • Stratified Sampling
    • Hypothesis Testing
    • Random Data Generation

    Hands-on: Calculating Common Statistical Parameters
    Hands-on: Calculating Correlation between Features Using Pearson’s Method
    Hands-on: Calculating Correlation between Features Using Spearman’s Method
    Hands-on: Perform Stratification on a Dataset
    Hands-on: Perform Hypothesis Testing Using Pearson’s Chi-squared Tests
    Hands-on: Generate Random Data

  3. Classification and Regression
    • Linear Models
    • Decision Trees
    • Naive Bayes

    Hands-on: Implementing Naive Bayes Model
    Hands-on: Performing Linear Regression
    Hands-on: Implementing Decision Trees Model

Day 2

  1. Collaborative Filtering
    • Alternating Least Squares

    Hands-on: Working with ALS.train() Method to Create Recommendation Model

  2. Clustering
    • K-means

    Hands-on: Using K-means Algorithm to Create Clusters for Unlabelled Dataset

  3. Dimensionality Reduction
    • Singular Value Decomposition
    • Principle Component Analysis

    Hands-on: Performing Dimensionality Reduction on Dataset to Extract Useful Features

  4. Feature Extraction and Transformation
    • TF-IDF
    • Word2vec
    • Standard Scaler
    • Normalizer

    Hands-on: Creating a Vector Representation of Words Using Word2vec

  5. Optimization
    • Stochastic Gradient Descent
    • Limited Memory BFGS

Project: Implementing Machine Learning with Spark

About The Trainer


Arzan Amaria
Sr. Solutions Architect – Cloud and IoT

Arzan has more than 9 years of experience in Microsoft infrastructure technology stack, Data Science, Cloud and IoT. He has great amount of experience in deploying Cloud based solutions. He is a multi-cloud professional with exposure to Azure, AWS and other IIoT Cloud platforms like GE Predix and IBM Watson.

As a Cloud Solution Architect at CloudThat, he is an expert at deploying, supporting and managing client infrastructures on Azure. Having core training and consulting experience, he specializes in delivering individual training and corporate training on Azure. He is also engaged in extensive research and development in the field of IoT and Data Science and leads a team for the same. He has delivered trainings on IoT and is currently designing Cloud integrated solutions.

He has been training professionals for various Microsoft partners such as Wipro, HPE, HCL, Infosys, Accenture, TCS and many more in the recent past.

He holds following Certifications:

  • GE Predix Certified Developer
  • Microsoft Certified Trainer (MCT)
  • CTT+ (Certified Technical Trainer)
  • MCSD: Azure Solutions Architect
  • MCSE (Server Track)
  • MCTS in Machine Learning
  • VCA-DCV (Data Center Virtualization – Associate)
  • Microsoft Certified Specialist with Hyper – V Virtualization
  • AWS Certified Solutions Architect – Associate Level
  • CEH (Certified Ethical Hacker, EC Council University US)

MCT CompTIA Cloud Essentials Microsoft Certified Solutions Developer
MCSE Microsoft Certified Technology Specialist MCTS vmware certified professional data center virtualization
Microsoft Specialist Server Virtualization with Windows Server AWS Solutions Architect Associate Certified Ethical Hacker

Other Details


For latest batch dates, fees, location and general inquiries, contact our sales team at: +91 8880002200 or

Upcoming Batches


Quick Inquiry: AI and ML

Recently Viewed Courses.
  • Machine Learning with Apache Spark

  • Favorite Courses
    No Favourites added yet.

    Our Partners