Query Greenplum Database from Pivotal HAWQ

I was getting lot of requests on this so I decided to write a separate post. The question was how do I query a Greenplum Database (GPDB) from Pivotal HAWQ.

Continue reading

Posted in Greenplum Database, MPP, Pivotal Database, Pivotal HAWQ, Web External table | Tagged , , , , | Leave a comment

Pivotal HAWQ – MPP database on HDFS

In this post I will go through the architecture of Pivotal HAWQ and how it works.

I strongly suggest to go through Introduction to Massively Parallel Processing (MPP) database before reading this as you will need some concepts of MPP database.

Pivotal HAWQ is a Massively Parallel Processing (MPP)  database using several Postgres database instances and HDFS storage. Think of your regular MPP databases like Teradata/Greenplum/Netezza but instead of using local storage it uses HDFS to store datafiles. Each of the processing nodes still has its own CPU/memory and storage.

Continue reading

Posted in Big Data, Hadoop, HDFS, MPP, Pivotal HAWQ, Postgress | 7 Comments

Introduction to Massively Parallel Processing (MPP)database

In Massively Parallel Processing (MPP) databases data is partitioned across multiple servers or nodes with each server/node having memory/processors to process data locally. All communication is via a network interconnect — there is no disk-level sharing or
contention to be concerned with (i.e. it is a ‘shared-nothing’ architecture).

I will try to explain how MPP database work by using Greenplum database as an example.

Continue reading

Posted in Greenplum Database, MPP, Pivotal Database, Postgress | 22 Comments

Greenplum and Hadoop HDFS integration

One of the features of Greenplum 4.2 version is the use of Hadoop HDFS file system to create external tables.
This is extremely useful when you want to avoid file movement from HDFS to local folder for data loading.

In this post I will go through the configuration of single node (Cent OS) Greenplum database to access and create external tables using hdfs.
Continue reading

Posted in gphdfs, Greenplum Database, Hadoop, HDFS, Postgress | 36 Comments

Installing Greenplum Database: Community Edition on a Mac OS X 10.7

Some of the key features of Greenplum Database are:

  •     Massively Parallel Processing (MPP) Architecture for Loading and Query Processing
  •     Polymorphic Data Storage-MultiStorage/SSD Support
  •     Multi-level Partitioning with Dynamic Partitioning Elimination

If you want to test this database on your Mac you can get a community edition that works
on single node.
Here are some installation steps that worked for me . The installation gives you an idea
of all the components of a MPP system.

You can download the software here


Continue reading

Posted in Greenplum Database, Mac, MPP | Tagged , , , , , , , | 7 Comments

Parameterized views in Oracle

Last week I came across an interesting problem.

Problem: I want to centralize my average assets calculation in one place and different downstream systems should be able to consume it. For example: Cognos reports should be able to use this, Informatica mapping can use this as a source. Very similar to an enterprise service.

Continue reading

Posted in Oracle Database, PL/SQL | Tagged , , , | Leave a comment

Exadata: 5 Adoption Roadblocks – Presentation at Oracle Openworld 2011

Here is a presentation by Robert Dawson that he did at OOW 2011. Interesting on how he correlated Exadata adoption with Grief Cycle.  These are roadblocks that nobody wants to talk about but every organization implementing Exadata will face them.

Continue reading

Posted in Exadata | Tagged , | 2 Comments