Hive implements MapReduce using HiveQL. The built-in capabilities of HiveQL abstracts the implementation of mappers and reducers with a simple yet powerful SQL like query language. To demonstrate the inbuilt capabilities of HiveQL, I will be analysing hashtags from a twitter feed on Hortonworks Data Platform (HDP). Continue Reading
Storage cluster (HDFS) in Hadoop is also the Processing cluster (MapReduce). Azure provides two different options to store data:
Option 1: Use HDInsight cluster to store data as well as to process MapReduce requests. For e.g. a Hive database hosted in an HDInsight cluster which also executes HiveQL MapReduce queries. In this instance data is stored in the cluster’s HDFS.
Option 2: Use HDInsight cluster to only process MapReduce requests whereas data is stored in Azure blob storage. For e.g. the Hive data is stored in Azure storage while the HDInsight cluster executes HiveQL MapReduce queries. Here the metadata of Hive database is stored in the cluster whereas the actual data is stored in Azure storage. The HDInsight cluster is co-located in the same datacentre as the Azure storage and connected by high speed network.
There are several advantage of using Azure storage (Option 2). Continue Reading
Hive implements MapReduce using HiveQL. The built-in capabilities of HiveQL abstracts the implementation of mappers and reducers with a simple yet powerful SQL like query language. To demonstrate the inbuilt capabilities of HiveQL, I will be analysing hashtags from a twitter feed on Azure HDInsight platform. Continue Reading
This post is a tutorial to get started on Hive in HDInsight.
The steps to be followed are given below. As a pre-requisite you would need a subscription to Microsoft Azure to try out these steps
- Provision Azure Storage Account
- Provision HDInsight Cluster
- Create Hive Database and Tables
- Prepare Data as Ctrl-A separated Text Files
- Upload Text Files to Azure Storage
- Load Data to Hive
- Execute HiveQL DML Jobs
Please refer to Working with Hive in HDInsight which the updated version of this post. The orginal post was written when HDInsight was a separate entity and in preview mode. Since then HDInsight has been completely integrated into Microsoft Azure cloud services. The concepts explained in this post still holds, however some of the instructions and screen captures have changed significantly. So I would recommend you to refer to the updated version of this post