hive limitations: underlying Hdfs, append-only, block-oriented storage

To overcome this problem, Hbase is used in place of MySQL, with Hive

Hbase Tables can be accessed like native Hive tables.

HBase integrated with Hive

keyword: Hive Storage Handler

Hbase 创建有两个列族的table:

1
2
3
4
5
6
7
8
9
10
11
create 'employee','personaldetails','deptdetails'
# put data
put 'employee','eid01','personaldetails:fname','Brundesh'
put 'employee','eid01','personaldetails:Lname','R'
put 'employee','eid01','personaldetails:salary','10000'
put 'employee','eid01','deptdetails:name','R&D'
put 'employee','eid01','deptdetails:location','Banglore'

put 'employee','eid02','personaldetails:fname','Abhay'
put 'employee','eid02','personaldetails:Lname','Kumar'
put 'employee','eid02','personaldetails:salary','100000'

image-20180915170307415

hive

If there are multiple columns family in HBase, we can create one table for each column families. In this case, we have 2 column families and hence we are creating two tables, one for each column families.

like:

1
2
3
4
5
6
7
8
create external table employee_hbase(
Eid String,
f_name string,
s_name string,
salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping"=":key,personaldetails:fname,personaldetails:Lname,personaldetails:salary")
tblproperties("hbase.table.name"="employee");

creating the non-native Hive table using Storage Handler, should specify the STORED BY clause.

Notes:

  • hbase.columns.mapping: hive & hbase columns mapping, 第一列必须是键列,同hbase行键列一致
  • be careful rowkey

create another table

1
2
3
4
5
6
7
8
CREATE EXTERNAL TABLE employee_dept_hbase(
eid STRING,
title STRING,
`location` STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,deptdetails:name,deptdetails:location")
TBLPROPERTIES("hbase.table.name"="employee");

query:

image-20180915170908793

image-20180915171041534

join query

image-20180915171231201