Category Archives: Postgress

Pivotal HAWQ – MPP database on HDFS

In this post I will go through the architecture of Pivotal HAWQ and how it works. I strongly suggest to go through Introduction to Massively Parallel Processing (MPP) database before reading this as you will need some concepts of MPP … Continue reading

Posted in Big Data, Hadoop, HDFS, MPP, Pivotal HAWQ, Postgress | 8 Comments

Introduction to Massively Parallel Processing (MPP)database

In Massively Parallel Processing (MPP) databases data is partitioned across multiple servers or nodes with each server/node having memory/processors to process data locally. All communication is via a network interconnect — there is no disk-level sharing or contention to be … Continue reading

Posted in Greenplum Database, MPP, Pivotal Database, Postgress | 30 Comments

Greenplum and Hadoop HDFS integration

One of the features of Greenplum 4.2 version is the use of Hadoop HDFS file system to create external tables. This is extremely useful when you want to avoid file movement from HDFS to local folder for data loading. In … Continue reading

Posted in gphdfs, Greenplum Database, Hadoop, HDFS, Postgress | 43 Comments