WiredTiger: A game changer for MongoDB

Storage engine is one of the key component of any database.  It is, in fact, a software module which is used by database management system to perform all storage related operations e.g. create information, read information and update any information.  The term storage means both disk storage and memory storage.   Choosing right storage engine is Read more about WiredTiger: A game changer for MongoDB[…]

Data Pump: impdp

Problem Statement: Restore entire database using Data Pump. Restore table(s) Restore tablespace(s) Restore schema(s) Restore using Transportable tablespaces (TTS) Restore from multiple small sizes of dump files Restore in parallel mode Approach: There are single shot solution to all the above problem statement and it is IMPDP in Data Pump.  It is one of various Read more about Data Pump: impdp[…]

Data Pump: expdp & impdp

Problem Statement: Backup entire database using Data Pump. Backup table(s) Backup tablespace(s) Backup schema(s) Backup using Transportable tablespaces (TTS) Generate multiple small sizes of dump files Backup in parallel mode Approach: There are single shot solution to all the above problem statement and it is Data Pump.  It is one of various backup tools provided Read more about Data Pump: expdp & impdp[…]

MongoDB Installation – Ubuntu

MongoDB is one of the document oriented open source database developed in c++, first come into shape in 2007 when in order to overcome the shortfall of existing database while working for an advertising company “DoubleClick” development team has decided to go further rather than struggling with database.  The team of this advertising company was Read more about MongoDB Installation – Ubuntu[…]

How to copy Multi terabyte data to another Database

Problem statement:  How to migrate huge data from One DB to another DB. Multi-Terabyte data loaded on one database should be copied to another database. Environment: You have multi-terabyte Database Your database is growing on daily basis, based on data feeds. Number of Indexes on these tables are very high, and thus, size of indexes Read more about How to copy Multi terabyte data to another Database[…]

MapReduce Unwinding … Reduce

MapReduce Unwinding … Sort & Shuffle

MapReduce Unwinding. . . . . Map

MapReduce Unwinding. . . . . .Algorithm

With discussion, in my last blog, about “How Hadoop manages Fault Tolerance” within its cluster while processing data, it is now time to discuss the algorithm which MapReduce used to process these data. It is Name Node (NN) where a user submits his request to process data and submits his data files.  As soon as NN receives data Read more about MapReduce Unwinding. . . . . .Algorithm[…]

MapReduce Unwinding. . . . . . Fault Tolerance

Before we see the intermediate data produced by the mapper, it would be quite interesting to see the fault tolerant aspects of Hadoop with respect to MapReduce processing. Once Name node (NN) received data files which has to be processed, it splits data files to assign it to Data Node (DN).  This assignment would be Read more about MapReduce Unwinding. . . . . . Fault Tolerance[…]

MapReduce Unwinding. . . . . Philosophy

The philosophy of Map Reduce workings is straight forward and can be summarized in 6 steps. Whatever data we provide as input to Hadoop, it first splits these data into smaller no of pieces. Typically, the size of data splitted is limited to 64MB.  If a file of 1 TB is arrived to process on data node, Read more about MapReduce Unwinding. . . . . Philosophy[…]

MapReduce : Internals

MapReduce is a programing paradigm which provide an interface for developers to map end user requirements (any type of analysis on data) to code.  This framework is one of the core component of Hadoop.  The way it provides fault tolerant and massive scalability across hundreds or thousands of servers in a cluster for processing of Read more about MapReduce : Internals[…]

HDFS Architecture Explained

Inspired from Google File System which was developed using C++ during 2003 by Google to enhance its search engine, Hadoop Distributed File System (HDFS), a Java based file system, becomes the core components of Hadoop. With its fault tolerant and self healing features, HDFS enables Hadoop to harness the true capability of distributed processing techniques by turning Read more about HDFS Architecture Explained[…]

MAGIC OF HADOOP

Because of the limitation of currently available Enterprise data warehousing tools, Organizations were not able to consolidate their data at one place to maintain faster data processing.  Traditional ETL tools may take hours, days and sometimes even weeks.  Performances of these tools are limited by two Hardware limitations. The vertical hardware scalability:   Hardware can be scaled Read more about MAGIC OF HADOOP[…]