Hive Programming

Outline:

  • Metastore a data base for metadata
  • HiveQL + compiler a query language for interacting with Hive
  • Hive Shell an interactive shell

Structure

  • Managed Table
  • External Table when creating an external table, use keyword external and specify the location to store the table.

Create table

Create table from an exisiting table

CREATE

Paritioning tables using PARTITIONED BY

  • Creating partitioned tables
CREATE TABLE table_name (

)
PARTITIONED BY (attribute_id type)
  • Loading partitioned tables:
load data local inpath

Insert

Multi-table inserts

Retrieval queries

SELECT [ALL|DISTINCT] select_expr, ...
FROM table_name,
[WHERE condition_statement, ...]
[GROUP BY ]
[ORDER BY ]
[CLUSTER ]
   | [DISTRIBUTE BY ] [SORT BY column_list]
[LIMIT ];
  • SORT BY

  • DISTRIBUTE BY for map-reduce scripts

  • CLUSTER BY

Join queries

Example:

Create a table

hive> create table wiki_edit (
    >     revision STRING,
    >     ArticleID BIGINT,
    >     RevisionID BIGINT,
    >     Article STRING,
    >     EditTime STRING,
    >     UserName STRING,
    >     UserID BIGINT)
    > row format delimited
    > fields terminated by ' '
    > stored as textfile;

Retirieval

Select DISTINCT UserName

Hive command line

linux> hive -e

silent mode:

linux -S -e

Hive versus Pig

Let's review hive and pig:

Writing Mappers and Reducers for Hive

Let's say we have a table on hive, and we want to process this table.

exmaple: word-count in hive, the challenge is we cannot split the words directly in hive. Therefore,

Original table

FROM (source_table

) mapper_output
Insert OVERWRITE TABLE output_table
(!apply reduce!);

Example 2

<139 hadoop2:~/web/cse491/14 >hive -e 'select count(distinct category)-1 from bestbuy'
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/mirjalil/hive_job_log_mirjalil_201603141909_421404800.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201602280853_1385, Tracking URL = http://hadoophead2:50030/jobdetails.jsp?jobid=job_201602280853_1385
Kill Command = /usr/local/hadoop-1.1.1/libexec/../bin/hadoop job  -kill job_201602280853_1385
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-03-14 19:09:15,209 Stage-1 map = 0%,  reduce = 0%
2016-03-14 19:09:19,261 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:20,274 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:21,286 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:22,294 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:23,302 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:24,311 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:25,320 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:26,329 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.14 sec
2016-03-14 19:09:27,337 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 3.64 sec
2016-03-14 19:09:28,347 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.19 sec
2016-03-14 19:09:29,356 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.19 sec
2016-03-14 19:09:30,365 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.19 sec
MapReduce Total cumulative CPU time: 5 seconds 190 msec
Ended Job = job_201602280853_1385
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 5.19 sec   HDFS Read: 12162089 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 190 msec
OK
1231
Time taken: 26.135 seconds

In [ ]: