Working with JSON Files in Hive

Step 1. Download hive-serdes-1.0-SNAPSHOT.jar from

http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar

Step 2. Copy this jar file to Hive library path.

If using cloudera distribution, then to this path usr/lib/hive/lib.

Step 3. Copy your JSON Files to HDFS location.

Step 4. Start your hive thrift server

hive –service hiveserver

Step 5. Start Hive shell

hive>

Step 6. Register hive-serdes-1.0-SNAPSHOT.jar

hive> ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;

Step 7. Create external table in hive pointing location to the hdfs location where you have stored your json files.Use custom json SerDe properties to read and write Json files from Hive

Using simple json file

E.g.  CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
field1 string, field2 int, field3 string, field4 double
)
ROW FORMAT SERDE ‘com.cloudera.hive.serde.JSONSerDe’
LOCATION ‘/home/user/hive/json/input’;

Query from Hive

Select * from my_table;

Using complex json file

 

CREATE EXTERNAL TABLE IF NOT EXISTS json_comp_table (
id INT,
city_code ARRAY<INT>,
email STRING,
contact STRUCT<Mobile_no:STRING, Telephone_no:STRING>
)
ROW FORMAT SERDE ‘com.cloudera.hive.serde.JSONSerDe’
LOCATION ‘/home/user/hive/json/complex’;

Query from Hive

select contact.Mobile_no from json_comp_table where id = 1;

Please Note: If there are huge json files and you want to save storage on hdfs use bzip2 compression to compress your json files on hdfs. You can query .bzip2 json files from hive without modifying anything.

Posted in Hive, JSON Files | Tagged , , | Leave a comment