Sqoop1的安装

1 解压安装

  1. hadoop@Master:~$ sudo tar xvfz sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
  2. hadoop@Master:~$ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /usr/local/sqoop1/
  3. hadoop@Master:~$ sudo chmod -R 775 /usr/local/sqoop1
  4. hadoop@Master:~$ sudo chown -R hadoop:hadoop /usr/local/sqoop1

2 修改环境变量

  1. hadoop@Master:~$ sudo nano /etc/profile
  2. #sqoop
  3. export SQOOP_HOME=/usr/local/sqoop1
  4. export PATH=$SQOOP_HOME/bin:$PATH
  5. hadoop@Master:~$ source /etc/profile

3 配置sqoop的环境变量

  1. 下载mysql驱动包,mysql-connector-java-5.1.27.jar
  2. jar包丢到到$SQOOP_HOME/lib下面
  3. hadoop@Master:/usr/local/sqoop1/conf$ cp sqoop-env-template.sh sqoop-env.sh
  4. # 指定各环境变量的实际配置
  5. # Set Hadoop-specific environment variables here.
  6. #Set path to where bin/hadoop is available
  7. #export HADOOP_COMMON_HOME=
  8. #Set path to where hadoop-*-core.jar is available
  9. #export HADOOP_MAPRED_HOME=
  10. #set the path to where bin/hbase is available
  11. #export HBASE_HOME=
  12. #Set the path to where bin/hive is available
  13. #export HIVE_HOME=
  14. 但是一般情况下我们的/etc/profile已经配置相关的环境变量
  15. export JAVA_HOME=/usr/lib/jvm/
  16. export JRE_HOME=${JAVA_HOME}/jre
  17. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:/usr/local/hive/lib
  18. export PATH=${JAVA_HOME}/bin:$PATH
  19. #HADOOP VARIABLES START
  20. export JAVA_HOME=/usr/lib/jvm/
  21. export HADOOP_INSTALL=/usr/local/hadoop
  22. export PATH=$PATH:$HADOOP_INSTALL/bin
  23. export PATH=$PATH:$JAVA_HOME/bin
  24. export PATH=$PATH:$HADOOP_INSTALL/sbin
  25. export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
  26. export HADOOP_COMMON_HOME=$HADOOP_INSTALL
  27. export HADOOP_HDFS_HOME=$HADOOP_INSTALL
  28. export YARN_HOME=$HADOOP_INSTALL
  29. export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
  30. export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
  31. #HADOOP VARIABLES END
  32. export HIVE_HOME=/usr/local/hive
  33. export PATH=$PATH:$HIVE_HOME/bin:/usr/local/hbase/bin
  34. export JAVA_LIBRARY_PATH=/usr/local/hadoop/lib/native
  35. export SCALA_HOME=/usr/lib/scala
  36. export PATH=$PATH:$SCALA_HOME/bin
  37. #sqoop
  38. export SQOOP_HOME=/usr/local/sqoop1
  39. export PATH=$SQOOP_HOME/bin:$PATH
  40. #HBASE
  41. export HBASE_HOME=/usr/local/hbase

4 开始测试

  1. 需要拷贝mysql的驱动到lib下面
  2. hadoop@Master:~/mysql-connector-java-5.0.8$ sudo cp mysql-connector-java-5.0.8-bin.jar /usr/local/sqoop1/lib/
  3. mysql为例子
  4. IP:192.168.1.178
  5. 用户名:chu888chu888
  6. 密码:skybar
  7. 数据库:hivetestdb
  8. 表:cdsgus
  9. [root@hadoop01 ~]# sqoop help
  10. Available commands:
  11. codegen Generate code to interact with database records
  12. create-hive-table Import a table definition into Hive
  13. eval Evaluate a SQL statement and display the results
  14. export Export an HDFS directory to a database table
  15. help List available commands
  16. import Import a table from a database to HDFS
  17. import-all-tables Import tables from a database to HDFS
  18. import-mainframe Import datasets from a mainframe server to HDFS
  19. job Work with saved jobs
  20. list-databases List available databases on a server
  21. list-tables List available tables in a database
  22. merge Merge results of incremental imports
  23. metastore Run a standalone Sqoop metastore
  24. version Display version information
  25. 列出所有的数据库
  26. hadoop@Master:/usr/local/sqoop1/lib$ sqoop list-databases --connect jdbc:mysql://192.168.1.178 --username chu888chu888 --password skybar
  27. 列出数据库中所有的表
  28. hadoop@Master:/usr/local/sqoop1/lib$ sqoop list-tables --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar
  29. 导出mysql表到hdfs
  30. hadoop@Master:/$ hdfs dfs -mkdir /user/sqoop
  31. hadoop@Master:/$ hdfs dfs -chown sqoop:hadoop /user/sqoop
  32. sqoop import --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar --table cdsgus --m 2 --target-dir /user/sqoop/cdsgus
  33. hadoop@Master:~$ sqoop import --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar --table cdsgus --m 2 --target-dir /user/sqoop/cdsgus
  34. Warning: /usr/local/sqoop1/../hcatalog does not exist! HCatalog jobs will fail.
  35. Please set $HCAT_HOME to the root of your HCatalog installation.
  36. Warning: /usr/local/sqoop1/../accumulo does not exist! Accumulo imports will fail.
  37. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  38. Warning: /usr/local/sqoop1/../zookeeper does not exist! Accumulo imports will fail.
  39. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
  40. 16/03/03 01:28:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  41. 16/03/03 01:28:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  42. 16/03/03 01:28:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  43. 16/03/03 01:28:13 INFO tool.CodeGenTool: Beginning code generation
  44. 16/03/03 01:28:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cdsgus` AS t LIMIT 1
  45. 16/03/03 01:28:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cdsgus` AS t LIMIT 1
  46. 16/03/03 01:28:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
  47. Note: /tmp/sqoop-hadoop/compile/7b9cf86a577c124c063ff5dc2242b3fb/cdsgus.java uses or overrides a deprecated API.
  48. Note: Recompile with -Xlint:deprecation for details.
  49. 16/03/03 01:28:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7b9cf86a577c124c063ff5dc2242b3fb/cdsgus.jar
  50. 16/03/03 01:28:17 WARN manager.MySQLManager: It looks like you are importing from mysql.
  51. 16/03/03 01:28:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
  52. 16/03/03 01:28:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
  53. 16/03/03 01:28:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
  54. 16/03/03 01:28:17 INFO mapreduce.ImportJobBase: Beginning import of cdsgus
  55. SLF4J: Class path contains multiple SLF4J bindings.
  56. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  57. SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  58. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  59. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  60. 16/03/03 01:28:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  61. 16/03/03 01:28:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  62. 16/03/03 01:28:18 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.1.80:8032
  63. 16/03/03 01:28:23 INFO db.DBInputFormat: Using read commited transaction isolation
  64. 16/03/03 01:28:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `cdsgus`
  65. 16/03/03 01:28:23 INFO mapreduce.JobSubmitter: number of splits:2
  66. 16/03/03 01:28:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456939431067_0002
  67. 16/03/03 01:28:24 INFO impl.YarnClientImpl: Submitted application application_1456939431067_0002
  68. 16/03/03 01:28:24 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1456939431067_0002/
  69. 16/03/03 01:28:24 INFO mapreduce.Job: Running job: job_1456939431067_0002
  70. 16/03/03 01:28:38 INFO mapreduce.Job: Job job_1456939431067_0002 running in uber mode : false
  71. 16/03/03 01:28:38 INFO mapreduce.Job: map 0% reduce 0%
  72. 16/03/03 01:32:11 INFO mapreduce.Job: map 50% reduce 0%
  73. 16/03/03 01:32:13 INFO mapreduce.Job: map 100% reduce 0%
  74. 16/03/03 01:32:14 INFO mapreduce.Job: Job job_1456939431067_0002 completed successfully
  75. 16/03/03 01:32:14 INFO mapreduce.Job: Counters: 31
  76. File System Counters
  77. FILE: Number of bytes read=0
  78. FILE: Number of bytes written=247110
  79. FILE: Number of read operations=0
  80. FILE: Number of large read operations=0
  81. FILE: Number of write operations=0
  82. HDFS: Number of bytes read=218
  83. HDFS: Number of bytes written=3130492684
  84. HDFS: Number of read operations=8
  85. HDFS: Number of large read operations=0
  86. HDFS: Number of write operations=4
  87. Job Counters
  88. Killed map tasks=1
  89. Launched map tasks=3
  90. Other local map tasks=3
  91. Total time spent by all maps in occupied slots (ms)=422821
  92. Total time spent by all reduces in occupied slots (ms)=0
  93. Total time spent by all map tasks (ms)=422821
  94. Total vcore-seconds taken by all map tasks=422821
  95. Total megabyte-seconds taken by all map tasks=432968704
  96. Map-Reduce Framework
  97. Map input records=20050144
  98. Map output records=20050144
  99. Input split bytes=218
  100. Spilled Records=0
  101. Failed Shuffles=0
  102. Merged Map outputs=0
  103. GC time elapsed (ms)=19391
  104. CPU time spent (ms)=206680
  105. Physical memory (bytes) snapshot=313565184
  106. Virtual memory (bytes) snapshot=3757293568
  107. Total committed heap usage (bytes)=65142784
  108. File Input Format Counters
  109. Bytes Read=0
  110. File Output Format Counters
  111. Bytes Written=3130492684
  112. 16/03/03 01:32:14 INFO mapreduce.ImportJobBase: Transferred 2.9155 GB in 235.5966 seconds (12.672 MB/sec)
  113. 16/03/03 01:32:14 INFO mapreduce.ImportJobBase: Retrieved 20050144 records.

2.Sqoop1的安装 - 图1

  1. 导出mysql表全部数据到hive
  2. hive> create database test_sqoop;
  3. OK
  4. Time taken: 0.81 seconds
  5. hive> show databases;
  6. OK
  7. chu888chu888
  8. default
  9. test_sqoop
  10. Time taken: 0.247 seconds, Fetched: 3 row(s)
  11. hive>
  12. 使用sqoop创建表并导入表
  13. sqoop import --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar --table cdsgus --hive-import --hive-table test_sqoop.cdsgus
  14. 将数据从hive导入mysql
  15. mysql> use hivetestdb;
  16. Database changed
  17. mysql> show tables;
  18. +----------------------+
  19. | Tables_in_test_sqoop |
  20. +----------------------+
  21. | cdsgus |
  22. +----------------------+
  23. 1 row in set (0.00 sec)
  24. mysql> truncate cdsgus;
  25. Query OK, 0 rows affected (0.00 sec)
  26. mysql> select * from cdsgus;
  27. Empty set (0.00 sec)
  28. sqoop --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar --table cdsgus --export-dir /user/hive/warehouse/test_sqoop.db/cdsgus/ --input-fields-terminated-by '\0001'
  29. 增量导入
  30. sqoop import --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar --table cdsgus --hive-import --hive-table
  31. test_sqoop.cdsgus --check-column id --incremental append --last-value 2
  32. HBASE导入
  33. sqoop import --connect jdbc:mysql://192.168.1.178/hive_hadoop --username chu888chu888 --password skybar --table TBLS --hbase-table TBLS --hbase-create-table --hbase-row-key TBL_ID --column-family SD_ID

错误阻力

  1. hadoop@Master:/$ sqoop import --connect jdbc:mysql://192.168.1.178/hivetestdb --username chu888chu888 --password skybar --table cdsgus
  2. Warning: /usr/local/sqoop1/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/sqoop1/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. Warning: /usr/local/sqoop1/../zookeeper does not exist! Accumulo imports will fail.
  7. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
  8. 16/03/03 00:32:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  9. 16/03/03 00:32:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  10. 16/03/03 00:32:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  11. 16/03/03 00:32:16 INFO tool.CodeGenTool: Beginning code generation
  12. 16/03/03 00:32:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cdsgus` AS t LIMIT 1
  13. 16/03/03 00:32:16 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@654f0d9c is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
  14. java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@654f0d9c is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
  15. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:914)
  16. at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2181)
  17. at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1542)
  18. at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
  19. at com.mysql.jdbc.Connection.execSQL(Connection.java:3277)
  20. at com.mysql.jdbc.Connection.execSQL(Connection.java:3206)
  21. at com.mysql.jdbc.Statement.executeQuery(Statement.java:1232)
  22. at com.mysql.jdbc.Connection.getMaxBytesPerChar(Connection.java:3673)
  23. at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:482)
  24. at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:443)
  25. at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:286)
  26. at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
  27. at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
  28. at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
  29. at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
  30. at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
  31. at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
  32. at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
  33. at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
  34. at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
  35. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  36. at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
  37. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
  38. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
  39. at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
  40. 16/03/03 00:32:17 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
  41. at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
  42. at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
  43. at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
  44. at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
  45. at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
  46. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  47. at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
  48. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
  49. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
  50. at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
  51. 如果出现上面的错误,请更新/usr/lib/sqoop/mysql-java-connector.jar文件。
  52. ISSUE: https://issues.apache.org/jira/browse/SQOOP-1400
  53. 这里面还有一种可能就是你在hadoop/common这个目录也有一个mysql的驱动包,这个包也许版本很古老!
  54. hadoop@Master:/usr/local/hadoop/share/hadoop/common$ ls
  55. hadoop-common-2.6.0.jar hadoop-common-2.6.0-tests.jar hadoop-nfs-2.6.0.jar jdiff lib mysql-connector-java-5.0.8-bin.jar sources templates
  56. hadoop@Master:/usr/local/hadoop/share/hadoop/common$ sudo rm -rf mysql-connector-java-5.0.8-bin.jar

FAQ

出现这二段警告,我们怎么解决,需要修改脚本

修改hadoop@hadoopmaster:/usr/local/sqoop/bin$ nano configure-sqoop

  1. hadoop@hadoopmaster:/usr/local/hive/lib$ sqoop list-databases --connect jdbc:mysql://hadoopslave2 --username hive --password hive
  2. Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
  7. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.