MapReduce Unwinding … Reduce

MapReduce Unwinding … Sort & Shuffle

MapReduce Unwinding. . . . . Map

MapReduce Unwinding. . . . . .Algorithm

With discussion, in my last blog, about "How Hadoop manages Fault Tolerance" within its cluster while processing data, it is now time to discuss the algorithm which MapReduce used to process these data. It is Name Node (NN) where a user submits his request to process data and submits his data files.  As soon as NN receives data

MapReduce Unwinding. . . . . . Fault Tolerance

Before we see the intermediate data produced by the mapper, it would be quite interesting to see the fault tolerant aspects of Hadoop with respect to MapReduce processing. Once Name node (NN) received data files which has to be processed, it splits data files to assign it to Data Node (DN).  This assignment would be

MapReduce Unwinding. . . . . Philosophy

The philosophy of Map Reduce workings is straight forward and can be summarized in 6 steps. Whatever data we provide as input to Hadoop, it first splits these data into smaller no of pieces. Typically, the size of data splitted is limited to 64MB.  If a file of 1 TB is arrived to process on data node,

MapReduce : Internals

MapReduce is a programing paradigm which provide an interface for developers to map end user requirements (any type of analysis on data) to code.  This framework is one of the core component of Hadoop.  The way it provides fault tolerant and massive scalability across hundreds or thousands of servers in a cluster for processing of

HDFS Architecture Explained

Inspired from Google File System which was developed using C++ during 2003 by Google to enhance its search engine, Hadoop Distributed File System (HDFS), a Java based file system, becomes the core components of Hadoop. With its fault tolerant and self healing features, HDFS enables Hadoop to harness the true capability of distributed processing techniques by turning