Sqoop环境快速搭建

环境下载地址

http://archive.cloudera.com/cdh5/cdh/5/

下载hadoop、hive、sqoop

Sqoop环境快速搭建_第1张图片

上传文件及解压文件

修改hadoop的配置文件

  • hadoop-env.sh、yarn、mapreduce 配置环境变量
export JAVA_HOME=/opt/modules/jdk1.7.0_67
  • slaves
hadoop-senior.beifeng.com
  • core-site.xml

        dfs.namenode.secondary.http-address
        hadoop-senior.beifeng.com:50090
    
    
    
        dfs.namenode.http-address
        hadoop-senior.beifeng.com:50070
    
    
        dfs.permissions.enabled
        false
    
    
    
        dfs.replication
        1
    
  • hdfs-site.xml
  
        dfs.namenode.secondary.http-address
        hadoop-senior.beifeng.com:50090
    
    
    
        dfs.namenode.http-address
        hadoop-senior.beifeng.com:50070
    
    
        dfs.permissions.enabled
        false
    
    
    
        dfs.replication
        1
    
  • yarn-site.xml

        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
    
        yarn.resourcemanager.hostname
        hadoop-senior.beifeng.com
    
    
        yarn.log-aggregation-enable
        true
    
    
        yarn.log-aggregation.retain-seconds
        60678
    
    
    
        yarn.nodemanager.resource.memory-mb
        4092
    
    
    
        yarn.nodemanager.resource.cpu-vcores
        4
    
  • mapred-site.xml
   
        mapreduce.framework.name
        yarn
    
    
      mapreduce.jobhistory.address
      hadoop-senior.beifeng.com:10020
    

    
      mapreduce.jobhistory.webapp.address
      hadoop-senior.beifeng.com:19888
    

格式化HDFS文件系统

命令:bin/hdfs namenode -format

启动服务

  • hdfs服务

sbin/start-dfs.sh

  • yarn服务

sbin/start-yarn.sh

  • jobhistory服务

sbin/mr-jobhistory-daemon.sh start historyserver

配置hive

  • 修改hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/cdh5.3.6/hadoop-2.5.0-cdh5.3.6

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/cdh5.3.6/hive-0.13.1-cdh5.3.6/conf

  • hive-log4j.properties.template
# Define some default values that can be overridden by system properties
hive.log.threshold=ALL
hive.root.logger=WARN,DRFA
hive.log.dir=/opt/cdh5.3.6/hive-0.13.1-cdh5.3.6/logs
hive.log.file=hive.log

  • 创建hivesite文件并初始化

vi hive-site.xml





    
      javax.jdo.option.ConnectionDriverName
      com.mysql.jdbc.Driver
      Driver class name for a JDBC metastore
    


    
      javax.jdo.option.ConnectionURL
      jdbc:mysql://hadoop-senior.beifeng.com:3306/metadata?createDatabaseIfNotExist=true
      JDBC connect string for a JDBC metadata
    


    
      javax.jdo.option.ConnectionUserName
      root
      username to use against metastore database
    

    
      javax.jdo.option.ConnectionPassword
      123456
      password to use against metastore database
    
    
    
      hive.cli.print.header
      true
      Whether to print the names of the columns in query output.
    

    
      hive.cli.print.current.db
      true
      Whether to include the current database in the Hive prompt.
    
    
    
      hive.fetch.task.conversion
      more
      
        Some select queries can be converted to single FETCH task minimizing latency.
        Currently the query should be single sourced not having any subquery and should not have
        any aggregations or distincts (which incurs RS), lateral views and joins.
        1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
        2. more    : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns)
      
    


  • 创建hive连接的mysql数据
create database matadata;
Sqoop环境快速搭建_第2张图片
  • 测试hive并解决异常


    Sqoop环境快速搭建_第3张图片

解决办法上传mysql连接的jar包
cp /opt/modules/hive-0.13.1/lib/mysql-connector-java-5.1.27-bin.jar /opt/cdh5.3.6/hive-0.13.1-cdh5.3.6/lib/

  • 创建元数据存储目录

bin/hdfs dfs -mkdir -p /user/hive/warehouse

  • 把组下的所有用户都有对/user/hive/warehouse目录写的权限

bin/hdfs dfs -chmod g+w /user/hive/warehouse

测试hive、hdfs、MapReduce

  • 创建一个表 (hive)
create table student(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
  • 导入数据(hdfs)
load data local inpath '/opt/datas/student.txt'into table student ;
  • 查询数据的个数(MapReduce)
select count(1) from student;

查看文件系统web

Sqoop环境快速搭建_第4张图片

查看yarn的web

Sqoop环境快速搭建_第5张图片

sqoop配置

  • 修改配置文件
Sqoop环境快速搭建_第6张图片
image.png
  • 复制mysql连接jar包
cp /opt/sofewares/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar  /opt/cdh5.3.6/sqoop-1.4.5-cdh5.3.6/lib/
  • 使用cloudera官方sqoop文档
    地址:http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.5-cdh5.3.6/SqoopUserGuide.html#_example_invocations_10

  • 查看mysql数据中有多少数据库

bin/sqoop list-databases \
--connect jdbc:mysql://hadoop-senior.beifeng.com:3306 \
--username root \
--password 123456 \
Sqoop环境快速搭建_第7张图片

你可能感兴趣的