Hive Getting data from HDFS
因为我们streaming的东西是存到了HDFS,然后可以存成CSV的格式,然后我就需要从HDFS Load数据进去Hive,然后后续的操作就是用Hive去做别的操作。
Detailed Steps
Verify the Hive config(hive-site.xml in conf):
1
2
3
4
5
6
7
8
9
10
11
12
13
14<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>group11</value>
</property>
<property>c
<name>javax.jdo.option.ConnectionPassword</name>
<value>student</value>
</property>Check the HDFS warehouse path:
hduser@student59:~$ hdfs dfs -ls /user/hive/warehouse
Check Hive Terminal:
1
2
3hive
show databases;
show tables;
Restusts:1
2
3
4
5
6
7hive> show databases;
OK
default
Time taken: 2.95 seconds, Fetched: 1 row(s)
hive> show tables;
OK
test
表明这里有个数据库叫做default,然后有一个table叫做test;
- Create Table:
CREATE TABLE b_results (b_price float, s_output int, t_time String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
查看表结构:
- Load data form local (called it
test.csv
):
path:/opt/apache-hive-2.3.2-bin/test.csv
1 | LOAD DATA LOCAL INPATH '/opt/apache-hive-2.3.2-bin/test.csv' OVERWRITE INTO TABLE test; |
Read HDFS file and save into hive
LOAD DATA INPATH '/twitter_sentiment_bitcoin/student59.txt' OVERWRITE INTO TABLE b_results;
Using Tabueau to connect to Hive (visualization)
Actually the speed of connecting to Hive table is quite slow (I think it is because the size of the data is too large).