MPP in a nutshell is a distributed processing architecture in a data warehouse world what it means is that a cluster of servers work together as single machine to distribute the work load between data nodes thus it speed up processing queries in the server and give you the result faster than a normal data warehouse environment.
Typically in MPP design when you run query against your data set in the background MPP engine will distribute your queries in multiple sub-query and run those to different node of cluster each node will process the sub-query on its own, once the result return, it will give that small result set to MPP engine and what a MPP engine will do is combine them together and return that to the user.
From client point of view “it could be application, a user” MPP Data warehouse will present itself as a single DBMS that accept single query and return single result set in the background there are might be many cluster running and distribute the work load between them but client see them as one single RDBMS and run query against that only.
MPP cluster is most of the time comes as a commodity appliance what I mean by that is it comes as box of server that software and hardware included all you need to do is provide power and network connectivity, this does not mean all MPP are the same but it’s true for most of them.
Individual MPP nodes are often based on PostgreSQL codebase/API, as in today market most of the MPP database are using PostgreSQL engine as it’s open source, some companies either maintained the PostgreSQL engine others may change it completely but mostly they maintain PostgreSQL engine.
Many MPP databases uses Column store what column store does is store all your data that are similar in one column like if you have salary column it will store all salary data in one column , but in none MPP database it store your data as one row of employee record and for each employee there would be one row of data and that includes salary as well, it does not group similar data as one.
Some of the key points of MPP database are as bellow:
- It’s combination of multiple cluster of Data nodes act as one database server
- In background it split your query in multiple sub-queries
- Client always sees this as one database
- Most of the time it comes as an appliance cabinet pre-installed
- Most of the time it’s Engine is PostgreSQL open source
- Many of them uses column stores
These were the main key points of MPP database engines, there are other points as well but these are the key bullet points that differentiate MPP from the normal database.