当前位置:首页 > 开发 > 开源软件 > 正文

【Spark十七】: Spark SQL第三部分结合HIVE

发表于: 2015-01-10   作者:bit1129   来源:转载   浏览次数:
摘要: Hive On Spark Spark发行版本里自带了Hive,也就是说,使用Hive时,不需要单独的安装Hive?   Spark SQL supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is no

Hive On Spark

Spark发行版本里自带了Hive,也就是说,使用Hive时,不需要单独的安装Hive?

 

Spark SQL supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly. In order to use Hive you must first run “sbt/sbt -Phive assembly/assembly” (or use -Phive for maven). This command builds a new assembly jar that includes Hive. Note that this Hive assembly jar must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.Configuration of Hive is done by placing your hive-site.xml file in conf/.

 

When working with Hive one must construct a HiveContext, which inherits from SQLContext, and adds support for finding tables in in the MetaStore and writing queries using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.xml, the context automatically creates metastore_db and warehouse in the current directory.

 

 

 

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val dbs = hiveContext.sql("show  databases");

///没做操作前只有default
scala> dbs.collect

///枚举所有的数据表
scala>hiveContext.sql("show tables").collect

 

 还可以使用hiveContext的hql语句

 

scala> import hiveContext._

///创建表
scala> hql("CREATE TABLE IF NOT EXISTS person(name STRING, age INT)")

scala> hql("select * from person");

scala> hql("show tables");

///加载数据,加载数据时,默认的换行符和默认的列分隔符是什么?
///列分隔的语法:row format delimited fields terminated by '/t'

scala> hql("LOAD DATA LOCAL INPATH '/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/data/person.txt' INTO TABLE person;"); 

 

 

 


问题:

1. 上面的操作,Hive关联的数据库是哪个?

2. 如果已经单独安装了Hive,是否让Spark去操作那个已经存在的Hive?

3. 

 

 

 

 

 

 

 

 

 

 未完待续

 

 

 

 

 

 

【Spark十七】: Spark SQL第三部分结合HIVE

  • 0

    开心

    开心

  • 0

    板砖

    板砖

  • 0

    感动

    感动

  • 0

    有用

    有用

  • 0

    疑问

    疑问

  • 0

    难过

    难过

  • 0

    无聊

    无聊

  • 0

    震惊

    震惊

编辑推荐
在Hive中,如果一个很大的表和一个小表做join,Hive可以自动或者手动使用MapJoin,将小表的数据加载
在Hive中,如果一个很大的表和一个小表做join,Hive可以自动或者手动使用MapJoin,将小表的数据加载
本文对Sogou的日志进行分析,Sogou日志下载地址. http://download.labs.sogou.com/dl/sogoulabdown/
本文对Sogou的日志进行分析,Sogou日志下载地址. http://download.labs.sogou.com/dl/sogoulabdown/
Spark SQL也公布了很久,今天写了个程序来看下Spark SQL、Spark Hive以及直接用Hive执行的效率进行
Spark SQL也公布了很久,今天写了个程序来看下Spark SQL、Spark Hive以及直接用Hive执行的效率进行
Spark SQL也公布了很久,今天写了个程序来看下Spark SQL、Spark Hive以及直接用Hive执行的效率进行
今天状态很差,很困,无精打采。学到的Spark知识,没有连贯起来,很多知识点有印象但是很模糊,说不
今天状态很差,很困,无精打采。学到的Spark知识,没有连贯起来,很多知识点有印象但是很模糊,说不
Spark源码编译与环境搭建 Note that you must have a version of Spark which does not include the
版权所有 IT知识库 CopyRight © 2009-2015 IT知识库 IT610.com , All Rights Reserved. 京ICP备09083238号