1.Hive的安装

Hive的安装-MySQL作为元数据库

  • 安装JDK-略过
  • 安装Hadoop-略过
  • 安装Mysql-略过

1建立Hive数据库,用户,赋予权限

  1. #mysql虚拟机的默认密码,在我做试验的时候是123456
  2. #mysql -u root -p
  3. mysql>grant all privileges on *.* to hive@"%" identified by "hive" with grant option;
  4. mysql>flush privileges;
  5. MysqlUbuntu中默认安装后,只能在本机访问,如果要开启远程访问,需要做以下两个步骤:
  6. #nano /etc/mysql/my.cnf
  7. 找到bind-address=127.0.0.1 ,把这一行注释掉

2安装Hive

  1. hadoop@Master:~$ sudo tar xvfz apache-hive-1.1.1-bin.tar.gz
  2. hadoop@Master:~$ sudo cp -R apache-hive-1.1.1-bin /usr/local/hive
  3. hadoop@Master:~$ sudo chmod -R 775 /usr/local/hive/
  4. hadoop@Master:~$ sudo chown hadoop:hadoop /usr/local/hive/
  5. #修改/etc/profile加入HIVE_HOME的变量
  6. export HIVE_HOME=/usr/local/hive
  7. export PATH=$PATH:$HIVE_HOME/bin
  8. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:/usr/local/hive/lib
  9. #修改hive/conf下的几个template模板并重命名为其他
  10. cp hive-env.sh.template hive-env.sh
  11. cp hive-default.xml.template hive-site.xml
  12. #配置hive-env.sh文件,指定HADOOP_HOME
  13. HADOOP_HOME=/usr/local/hadoop
  14. #修改hive-site.xml文件,指定MySQL数据库驱动、数据库名、用户名及密码,修改的内容如下所示
  15. <property>
  16. <name>javax.jdo.option.ConnectionURL</name>
  17. <value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value>
  18. <description>JDBC connect string for a JDBC metastore</description>
  19. </property>
  20. <property>
  21. <name>javax.jdo.option.ConnectionDriverName</name>
  22. <value>com.mysql.jdbc.Driver</value>
  23. <description>Driver class name for a JDBC metastore</description>
  24. </property>
  25. <property>
  26. <name>javax.jdo.option.ConnectionUserName</name>
  27. <value>hive</value>
  28. <description>username to use against metastore database</description>
  29. </property>
  30. <property>
  31. <name>javax.jdo.option.ConnectionPassword</name>
  32. <value>hive</value>
  33. <description>password to use against metastore database</description>
  34. </property>
  35. 其中:
  36. javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串;
  37. javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称;
  38. javax.jdo.option.ConnectionUserName参数指定了数据库的用户名;
  39. javax.jdo.option.ConnectionPassword参数指定了数据库的密码。

3修改hive/bin下的hive-config.sh文件,设置JAVA_HOME,HADOOP_HOME

  1. export JAVA_HOME=/usr/lib/jvm
  2. export HADOOP_HOME=/usr/local/hadoop
  3. export HIVE_HOME=/usr/local/hive

4下载mysql-connector-java-5.1.27-bin.jar文件,并放到$HIVE_HOME/lib目录下

  1. 可以从Mysql的官方网站下载,但是记得一定要解压呀,下载的是一个tar.gz文件

5在HDFS中创建/tmp和/user/hive/warehouse并设置权限

  1. hadoop fs -mkdir /tmp
  2. hadoop fs -mkdir /user/hive/warehouse
  3. hadoop fs -chmod g+w /tmp
  4. hadoop fs -chmod g+w /user/hive/warehouse

6启动hadoop。进入hive shell,输入一些命令查看

  1. hive
  2. show databases;
  3. show tables;

7可以在hadoop中查看hive生产的文件

  1. hadoop dfs -ls /user/hive/warehouse

Hive使用实例

在正式讲解HiveQL之前,先在命令行下运行几样命令是有好处的,可以感受一下HiveQL是如何工作的,也可以自已随便探索一下.

1查询示例

  1. hive> SHOW TABLES;
  2. OK
  3. testuser
  4. Time taken: 0.707 seconds, Fetched: 1 row(s)
  5. hive> DESC testuser;
  6. OK
  7. id int
  8. username string
  9. Time taken: 0.38 seconds, Fetched: 2 row(s)
  10. hive> SELECT * from testuser limit 10;
  11. OK
  12. 1 sssss
  13. 1 sssss
  14. Time taken: 0.865 seconds, Fetched: 2 row(s)
  15. hive>
  16. hive> select count(1) from testuser;
  17. Query ID = hadoop_20160205004747_9d84aaca-887a-43a0-bad9-eddefe4e2219
  18. Total jobs = 1
  19. Launching Job 1 out of 1
  20. Number of reduce tasks determined at compile time: 1
  21. In order to change the average load for a reducer (in bytes):
  22. set hive.exec.reducers.bytes.per.reducer=<number>
  23. In order to limit the maximum number of reducers:
  24. set hive.exec.reducers.max=<number>
  25. In order to set a constant number of reducers:
  26. set mapreduce.job.reduces=<number>
  27. Starting Job = job_1454604205731_0001, Tracking URL = http://Master:8088/proxy/application_1454604205731_0001/
  28. Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1454604205731_0001
  29. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  30. 2016-02-05 00:48:11,942 Stage-1 map = 0%, reduce = 0%
  31. 2016-02-05 00:48:19,561 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.38 sec
  32. 2016-02-05 00:48:28,208 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.77 sec
  33. MapReduce Total cumulative CPU time: 2 seconds 770 msec
  34. Ended Job = job_1454604205731_0001
  35. MapReduce Jobs Launched:
  36. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.77 sec HDFS Read: 6532 HDFS Write: 2 SUCCESS
  37. Total MapReduce CPU Time Spent: 2 seconds 770 msec
  38. OK
  39. 2
  40. Time taken: 35.423 seconds, Fetched: 1 row(s)

通过这些消息,可以知道该查询生成了一个Mapreduce作业,Hive之美在于用户根本不需要知道MapReduce的存在,用户所需关心的,仅仅是使用一种类似于SQL的语言.

  1. 多次重复实现大量数据插入
  2. hive> insert overwrite table testuser
  3. > select id,count(id)
  4. > from testuser
  5. > group by id;