Step 1. Download hive-serdes-1.0-SNAPSHOT.jar from
http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar
Step 2. Copy this jar file to Hive library path.
If using cloudera distribution, then to this path usr/lib/hive/lib.
Step 3. Copy your JSON Files to HDFS location.
Step 4. Start your hive thrift server
hive –service hiveserver
Step 5. Start Hive shell
hive>
Step 6. Register hive-serdes-1.0-SNAPSHOT.jar
hive> ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
Step 7. Create external table in hive pointing location to the hdfs location where you have stored your json files.Use custom json SerDe properties to read and write Json files from Hive
Using simple json file
E.g. CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
field1 string, field2 int, field3 string, field4 double
)
ROW FORMAT SERDE ‘com.cloudera.hive.serde.JSONSerDe’
LOCATION ‘/home/user/hive/json/input’;
Query from Hive
Select * from my_table;
Using complex json file
CREATE EXTERNAL TABLE IF NOT EXISTS json_comp_table (
id INT,
city_code ARRAY<INT>,
email STRING,
contact STRUCT<Mobile_no:STRING, Telephone_no:STRING>
)
ROW FORMAT SERDE ‘com.cloudera.hive.serde.JSONSerDe’
LOCATION ‘/home/user/hive/json/complex’;
Query from Hive
select contact.Mobile_no from json_comp_table where id = 1;
Please Note: If there are huge json files and you want to save storage on hdfs use bzip2 compression to compress your json files on hdfs. You can query .bzip2 json files from hive without modifying anything.