UNION ALL usage in Hive

 

UNION ALL is used to combine the result from multiple SELECT statements into a single result set. Duplicate records are displayed. Each SELECT statement within the UNION ALL must have the same number of columns. The columns must also have similar data types and similar names. Also, the columns in each SELECT statement must be in the same order. Otherwise, a schema error is thrown.

Lets create 3 tables.

Employee table


create table employee (emp_id BIGINT, emp_name STRING, dept_id STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

emp

Work Information Table


create table workinfo (emp_id BIGINT,emp_name STRING, dept_id STRING,work_status STRING,country STRING, state STRING, emp_age INT)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

work

Department Information


create table department (dept_id STRING, dept_name STRING)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

dept

Query to check which users are of what work status, works for which department, belongs to which country, state and what are their ages

select temp.emp_name,temp.dept_id,
w.work_status, w.country, w.state, w.emp_age, d.dept_name
from
(select e.emp_id,e.emp_name,e.dept_id
from employee e

UNION ALL

select w.emp_id,w.emp_name,w.dept_id
from workinfo w)
temp join workinfo w on (temp.emp_id = w.emp_id) join department d on (w.dept_id = d.dept_id);

Result


result

You can also view other big data tutorials here.

 

 

Posted in Hive, UNION ALL | Tagged , | Leave a comment