Hive Getting data from HDFS

May 7, 2018

因为我们streaming的东西是存到了HDFS，然后可以存成CSV的格式，然后我就需要从HDFS Load数据进去Hive，然后后续的操作就是用Hive去做别的操作。

Detailed Steps

Verify the Hive config(hive-site.xml in conf):

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.cj.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>group11</value>
</property>

<property>c
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>student</value>
</property>

Check the HDFS warehouse path:
hduser@student59:~$ hdfs dfs -ls /user/hive/warehouse
Check Hive Terminal:
1
2
3
hive
show databases;
show tables;

Restusts:

hive> show databases;
OK
default
Time taken: 2.95 seconds, Fetched: 1 row(s)
hive> show tables;
OK
test

表明这里有个数据库叫做default，然后有一个table叫做test;

Create Table:
CREATE TABLE b_results (b_price float, s_output int, t_time String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

查看表结构：

Load data form local (called it test.csv):
path:/opt/apache-hive-2.3.2-bin/test.csv

1 2	LOAD DATA LOCAL INPATH '/opt/apache-hive-2.3.2-bin/test.csv' OVERWRITE INTO TABLE test; Loading data to table default.test

Read HDFS file and save into hive

LOAD DATA INPATH '/twitter_sentiment_bitcoin/student59.txt' OVERWRITE INTO TABLE b_results;

Using Tabueau to connect to Hive (visualization)

Actually the speed of connecting to Hive table is quite slow (I think it is because the size of the data is too large).

Cloud Computing