Hive is component that sets in top of Hadoop, which give us ability to run SQL query in top of Hadoop and this comes virtually with all Hadoop distribution that works with MapReduce and translate our query into MapReduce job as not everyone are able to write their own MapReduce code to talk with Hadoop so that is why we have Hive even the most experience programmer that can write MapReduce code will prefer to use Hive.

Hive started as sub project of Apache Hadoop and then as times goes by and people understanding it’s important it move to top-level projects.


Hive is abstraction over MapReduce they way Hive works is it come with it’s own Query Language called HiveQL they way it works I already explain but will go a bit in details is that when we write normal SQL query HiveQL will take that and translate your query into normal MapReduc Java code to get the job done.

One thing should be clear here that by using Hive we are not by passing the MapReduce rather we are seeing another face of it, it just make it easier for those that currently using normal relation database to feel comfortable by working with Hadoop.

Hive have it’s own file system and they are stored in HDFS format however they are fully structured data which means it’s scheme-bound.


Hbase is another default program that come with all Hadoop distribution and it’s NoSQL database even though you have other option to use like MongoDB which is also NoSQL database but Hbase is a default NoSQL that comes with all Hadoop, and Hive can be used with Hbase over Hadoop.

Why Hive is getting that much important is because virtually every business intelligence software that is in business now days are all using Hive as interface between Hadoop and MapReduce which Hive give them that feel and look alike for relation database the freedom to query and look into tables and rows and do calculation with normal SQL is great to have, as it makes life too easy to have your current SQL with the Hadoop which not only give you freedom to write new Query in old school way but also you don’t need to convert your current SQL query to MapReduce.

What Hive can’t do?

Its worth to mention here that while HiveQL support to run your current SQL query that you are using for relation database but it’s not supporting fully all SQL-92 standard including Multi-table insert, Create table as select, materialised view.

What Hive can do?

While hive is not supporting all features of SQL standard but it does not mean there is no use for it, it almost supports around 70% of the normal day to day SQL like selecting, indexes, limited sub-query there are plans to add more features in coming Hive which are insert, update, delete with full ACID functionality.