当前位置:首页 > 开发 > 开源软件 > 正文

pig on tez测试

发表于: 2015-06-02   作者:duguyiren3476   来源:转载   浏览次数:
摘要: pig on tez测试   pig tez hadoop hdfs   测试环境 pig-0.14.0 hadoop-2.5.2 ()1+2) hive on tez 测试后,很好奇,pig是否可以在tez上运行呢?从官网上可以看到pig on tez的描述,就想应该是可以pig on tez的. pig安

pig on tez测试

 

pig tez hadoop hdfs

 

测试环境

  • pig-0.14.0
  • hadoop-2.5.2 ()1+2)

hive on tez 测试后,很好奇,pig是否可以在tez上运行呢?从官网上可以看到pig on tez的描述,就想应该是可以pig on tez的.

pig安装过程略过…

准备数据集

[hadoop@mymaster ~]$ wget http://hortonassets.s3.amazonaws.com/pig/lahman591-csv.zip
[hadoop@mymaster ~]$ unzip lahman591-csv.zip
[hadoop@mymaster ~]$ hadoop fs mkdir /test
[hadoop@mymaster ~]$ hadoop fs -put lahman591-csv/Batting.csv /test  // 将Batting.csv上传到hdfs

编写pig测试脚本

[hadoop@mymaster ~]$ mkdir pig
[hadoop@mymaster ~]$ vim pig/test.pig
batting = LOAD '/test/Batting.csv' USING PigStorage(',');
raw_runs = FILTER batting BY $1>0;
runs = FOREACH raw_runs GENERATE $0 AS playerID, $1 AS year, $8 AS runs;
grp_data = GROUP runs BY (year);
max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) AS max_runs;
join_max_runs = JOIN max_runs BY ($0, max_runs), runs BY (year, runs);
join_data = FOREACH join_max_runs GENERATE $0 AS year, $2 AS playerID, $1 AS runs;
DUMP join_data;

mr方式运行 test.pig

[hadoop@mymaster ~]$ /usr/local/pig-0.14.0/bin/pig -x mr pig/test.pig // 片段 Input(s): Successfully read 95195 records (6399268 bytes) from: "/test/Batting.csv" Output(s): Successfully stored 151 records (4507 bytes) in: "hdfs://10.128.17.21:9000/tmp/temp1552838877/tmp1249909816" Counters: Total records written : 151 Total bytes written : 4507 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 ------- (1988,boggswa01,128.0) (1989,boggswa01,113.0) (1990,henderi01,119.0) (1991,molitpa01,133.0) (1992,phillto02,114.0) (1993,dykstle01,143.0) (1994,thomafr04,106.0) (1995,biggicr01,123.0) (1996,burksel01,142.0) (1997,biggicr01,146.0) (1998,sosasa01,134.0) (1999,bagweje01,143.0) (2000,bagweje01,152.0) (2001,sosasa01,146.0) (2002,soriaal01,128.0) (2003,pujolal01,137.0) (2004,pujolal01,133.0) (2005,pujolal01,129.0) (2006,sizemgr01,134.0) (2007,rodrial01,143.0) (2008,ramirha01,125.0) (2009,pujolal01,124.0) (2010,pujolal01,115.0) (2011,grandcu01,136.0) 2015-06-02 13:49:39,574 [main] INFO org.apache.pig.Main - Pig script completed in 1 minute, 10 seconds and 20 milliseconds (70020 ms)

运行情况: 
输入:95195 records 
输出:151 records 
耗时:70s

tez 方式运行test.pig

[hadoop@mymaster pig]$ /usr/local/pig-0.14.0/bin/pig -x tez pig/test.pig HadoopVersion: 2.5.2 PigVersion: 0.14.0 TezVersion: 0.5.2 UserId: hadoop FileName: 1.pig StartedAt: 2015-06-02 13:50:03 FinishedAt: 2015-06-02 13:50:34 Features: HASH_JOIN,GROUP_BY,FILTER Success! DAG PigLatin:1.pig-0_scope-0: ApplicationId: job_1432693876849_0008 TotalLaunchedTasks: 3 FileBytesRead: 3494886 FileBytesWritten: 5509316 HdfsBytesRead: 6398886 HdfsBytesWritten: 4507 Input(s): Successfully read 95195 records (6398886 bytes) from: "/test/lahman591-csv/Batting.csv" Output(s): Successfully stored 151 records (4507 bytes) in: "hdfs://10.128.17.21:9000/tmp/temp-1130777030/tmp93164502" (1994,thomafr04,106.0) (1995,biggicr01,123.0) (1996,burksel01,142.0) (1997,biggicr01,146.0) (1998,sosasa01,134.0) (1999,bagweje01,143.0) (2000,bagweje01,152.0) (2001,sosasa01,146.0) (2002,soriaal01,128.0) (2003,pujolal01,137.0) (2004,pujolal01,133.0) (2005,pujolal01,129.0) (2006,sizemgr01,134.0) (2007,rodrial01,143.0) (2008,ramirha01,125.0) (2009,pujolal01,124.0) (2010,pujolal01,115.0) (2011,grandcu01,136.0) 2015-06-02 13:50:34,623 [main] INFO org.apache.pig.Main - Pig script completed in 34 seconds and 350 milliseconds (34350 ms) 2015-06-02 13:50:34,634 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Shutting down thread pool 

运行情况: 
输入:95195 records 
输出:151 records 
耗时:34s

测试结果 tez比yarn快2倍多 
根据本轮测试效果差异不大,要根据mr的串联数和数据大小 进行严格的测试 才能达到理想的官网测试性能指标,但是可以肯定的是,mr任务串联越多,tez的性能越显著.

hive on tez 的配置稍嫌麻烦,相对来说pig on tez的测试环境相当容易

参考:http://zh.hortonworks.com/hadoop-tutorial/faster-pig-tez/

pig on tez测试

  • 0

    开心

    开心

  • 0

    板砖

    板砖

  • 0

    感动

    感动

  • 0

    有用

    有用

  • 0

    疑问

    疑问

  • 0

    难过

    难过

  • 0

    无聊

    无聊

  • 0

    震惊

    震惊

编辑推荐
tez ui 安装测试 标签(空格分隔): 未分类 环境:hadoop-2.7.1,tez-0.7.0,tomcat 7 将编译后的t
hive on tez详细配置和运行测试 tez hadoop hive hdfs <code style="padding: 2px 4px; font-fam
1,Tez是什么? Tez是Hortonworks公司开源的一种新型基于DAG有向无环图开源计算框架,它可以将多个
1,Tez是什么? Tez是Hortonworks公司开源的一种新型基于DAG有向无环图开源计算框架,它可以将多个
1,Tez是什么? Tez是Hortonworks公司开源的一种新型基于DAG有向无环图开源计算框架,它可以将多个
1,Tez是什么? Tez是Hortonworks公司开源的一种新型基于DAG有向无环图开源计算框架,它可以将多个
1,Tez是什么? Tez是Hortonworks公司开源的一种新型基于DAG有向无环图开源计算框架,它可以将多个
1,Tez是什么? Tez是Hortonworks公司开源的一种新型基于DAG有向无环图开源计算框架,它可以将多个
9 Pig
Pig 1.pig 是基于hadoop的一个数据处理框架. 2.MapReduce是使用java开发的。Pig有一套自己的数据处
Pig是Apache的一个开源项目,用于简化MapReduce的开发。研究了一段时间,略有心得。系废话不多说,
版权所有 IT知识库 CopyRight © 2009-2015 IT知识库 IT610.com , All Rights Reserved. 京ICP备09083238号