Clientele ➞

Big Data and Hadoop for Developers – Level 1

processing bigdata with apache hadoop_2

Duration: 2 Days


Gartner predicts that 4.4 Million Jobs will be created globally to support BigData. BigData is a popular term used to describe the exponential growth, availability and use of information; both structured and unstructured. It is imperative that organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information that forms BigData. Hadoop is the core platform for structuring BigData, and solves the problem of making it useful for Analytics. Our course will teach you all you need to learn about using Hadoop for BigData analysis and give you a clear understanding about processing BigData with Hadoop.

Why learn about Processing BigData with Hadoop?

  • Businesses are now aware of the large volumes of data that they generate in their day to day transactions. They have also realized that this BigData can provide very valuable insights once analyzed
  • The massive volume of BigData and its unstructured format make it difficult to analyze BigData. 
...Read more


  • What is Hadoop and how can it help process large data sets.
  • How to write MapReduce programs using Hadoop API.
  • How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API, for effectively loading and processing data in Hadoop.
  • How to ingest data from a RDBMS or a data warehouse to Hadoop.
  • Best practices for building, debugging and optimizing Hadoop solutions.
  • Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how they can help in BigData projects.

Who should attend

  • A developer who wants to learn Hadoop but you don’t know where to start
  •  A team that is struggling to extract insights from large scale and fast growing data in traditional systems
  • A team that has decided to migrate from a RDBMS or a traditional data warehouse to Hadoop, but needs help getting started


Course Outline

  • Introduction
    • Big Data
    • Data Science
    • Hadoop
      Hands-on: Install and configure a multi node Hadoop cluster with Ambari
  • Data Storage
    • File System Abstraction
    • Big Data and Distributed File Systems
    • Hadoop Distributed File System (HDFS)
      Hands-on: Manipulating files in HDFS using hadoop fs commands.
      Hands-on: Manipulating files in HDFS pragmatically using the FileSystem API.
    • Alternative Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.
  • Data Processing
    • MapReduce
      Hands-on: Write a simple log analysis MapReduce application
      Hands-on: Write an Inverted Index MapReduce Application with custom Partitioner and Combiner
      Hands-on: Writing a streaming MapReduce job in Python
  • YARN and Hadoop 2.0
  • Data Integration
    • Integrating Hadoop into your existing enterprise.
    • Introduction to Sqoop
      Hands-on: Importing data from an RDBMS to HDFS using Sqoop
      Hands-on: Exporting data from HDFS to an RDBMS
    • Other data integration tools: Flume, Kafka, Informatica, Talend etc.
  • Higher Level Tools
    • Defining workflows with Oozie
    • An introduction to Hive
    • An introduction to Pig
    • An introduction to HBase

About The Trainer

Bhavesh Goswami
Co-Founder & CEO


Bhavesh Goswami the Co-Founder & CEO of CloudThat Technologies, is a leading expert in Cloud Computing space with over a decade of experience. He was in the initial development team of Amazon Simple Storage Service (S3) at Amazon Web Services (AWS) in Seattle. He honed his Cloud Computing skills at Amazon where he helped ship the first version of S3 in 2006. Later he moved to Microsoft after over three years at Amazon to take up the challenge to help manage Cosmos, the Cloud storage and Big Data computational engine that powers all of the Microsoft’s Online Services, including Bing.

After living in the USA for over 10 years, he came to India in search of a challenge and Co-Founded CloudThat Technologies. He has personally trained over thousands of people on various Cloud technologies like AWS, Microsoft Azure, Google App Engine and more since early 2012.

Bhavesh has spoken at various Cloud and Big Data conferences and events like ‘7th Cloud Computing & Big Data’ and have been the Key Note Speaker at ‘International Conference on Computer Communication and Informatics’. He has authored numerous research papers and patents in various fields. He is passionate about technology and keeps the company up to date on latest Cloud technologies & market trends.

He holds following Certifications:

  • AWS Certified Solutions Architect – Professional Level
  • AWS Certified DevOps Engineer – Professional Level
  • MCT (Microsoft Certified Trainer)
  • MCSD (Microsoft Certified Solutions Developer)

AWS Solutions Architect Professional AWS DevOps Engineer Professional
Microsft Certified Trainer Microsft Certified Solutions Deveoper

Other Details



For latest batch dates, fees, location, technical queries and general inquiries, contact our sales team at: +91 8880002200 or email at

Upcoming Batches


Quick Inquiry: Big Data

Favorite Courses
No Favourites added yet.

Our Partners