MapReduce : Internals

MapReduce is a programing paradigm which provide an interface for developers to map end user requirements (any type of analysis on data) to code.  This framework is one of the core component of Hadoop.  The way it provides fault tolerant and massive scalability across hundreds or thousands of servers in a cluster for processing of multi-terabytes of data, it easily become the heart of Hadoop cluster architecture in both version.

The history lies into Google, which as per Doug, is living a few years in the future and sends the rest of us messages.  It has long been a pioneer in taking up challenges and opportunities of big data.  This is simply because of nature of its business.  Google had added two booster doze to this technology when it release two separate papers on GFS and MapReduce during 2003-2004.   Doug Cutting and Mike Cafarella like these two concepts and implemented both in their own way to create Hadoop and make it a success.

This term actually refers to the two distinct task which is separated from each other but work in conjunction to achieve results.  While, first task is Map which takes one set of data and convert it into another where every elements are broken down into Key/Value pairs which is known as tuples, second task is Reduce which takes the input from output of Map and combines those key/values into another set of key/value to achieve desired result.

 

Unwinding whole MapReduce process requires details explanation and will be done in a while.  Let’s first understand few of its components.  Working of MapReduce become simpler and achievable majorly with the help of three components.  These three components are:

  1. JobTracker: It resides at Master Node and manages all jobs and resource in cluster.
  2. TaskTracker: Deployed to each machine in cluster to perform Map and Reduce task.
  3. JobHistoryServer: It tracks completed jobs and deployed as separate function with JT.

There are 1 to 9 tasks which has to be performed by these components in the picture.

 

Unwinding of MapReduce working will be coming next….

 

Leave a Reply

Your email address will not be published. Required fields are marked *