[一起学Hive]之十五-分析Hive表和分区的统计信息(Statistics)

14.1 新表的统计信息

```CREATE TABLE lxw1234 (
id STRING,
name STRING
) stored AS textfile;```

[一起学Hive]之十四-Hive的元数据表结构详解

SELECT  *  FROM  TABLE_PARAMS  WHERE  tbl_id = 45857

 TBL_ID PARAM_KEY PARAM_VALUE 45857 transient_lastDdlTime 1436916981

INSERT overwrite TABLE lxw1234 SELECT pt,pcid FROM lxw1;

Table default.lxw1234 stats:

[numFiles=1, numRows=11067, totalSize=376278, rawDataSize=365211]

SELECT  *  FROM  TABLE_PARAMS  WHERE  tbl_id = 45857

 TBL_ID PARAM_KEY PARAM_VALUE 45857 transient_lastDdlTime 1436917459 45857 numFiles 1 45857 numRows 11067 45857 rawDataSize 365211 45857 totalSize 376278 45857 COLUMN_STATS_ACCURATE true

14.2 新分区的统计信息

```CREATE TABLE lxw1234 (
id STRING,
name STRING
) PARTITIONED BY (day STRING);```

SELECT * FROM `PARTITIONS` WHERE tbl_id = 45858

INSERT overwrite TABLE lxw1234 PARTITION (day = ‘2015-07-15′)

SELECT pt,pcid

FROM lxw1;

Partition default.lxw1234{day=2015-07-15} stats:

[numFiles=1, numRows=11067, totalSize=376278, rawDataSize=365211]

SELECT * FROM `PARTITIONS` WHERE tbl_id = 45858

 PART_ID CREATE_TIME LAST_ACCESS_TIME PART_NAME SD_ID TBL_ID 56806 1436918167 0 day=2015-07-15 98259 45858

SELECT * FROM PARTITION_PARAMS WHERE PART_ID = 56806

 PART_ID PARAM_KEY PARAM_VALUE 56806 transient_lastDdlTime 1436918167 56806 numFiles 1 56806 numRows 11067 56806 rawDataSize 365211 56806 totalSize 376278 56806 COLUMN_STATS_ACCURATE true

PARTITIONS、PARTITION_PARAMS

14.3 已存在表或分区的统计信息

• 外部表
```    CREATE EXTERNAL TABLE lxw1234 (
id STRING,
name STRING
) stored AS textfile
location 'hdfs://namenode/tmp/lxw1234.com/';
```

SELECT * FROM TABLE_PARAMS WHERE tbl_id = 45859

 TBL_ID PARAM_KEY PARAM_VALUE 45859 transient_lastDdlTime 1436918758 45859 numFiles 0 45859 numRows -1 45859 rawDataSize -1 45859 totalSize 0 45859 COLUMN_STATS_ACCURATE false 45859 EXTERNAL true

ANALYZE TABLE lxw1234 COMPUTE STATISTICS;

Table default.lxw1234 stats:

[numFiles=0, numRows=11067, totalSize=0, rawDataSize=365211]

SELECT * FROM TABLE_PARAMS WHERE tbl_id = 45859

 TBL_ID PARAM_KEY PARAM_VALUE 45859 transient_lastDdlTime 1436918995 45859 numFiles 0 45859 numRows 11067 45859 rawDataSize 365211 45859 totalSize 0 45859 COLUMN_STATS_ACCURATE true 45859 EXTERNAL true

• 分区

```CREATE TABLE lxw1234 (
id STRING,
name STRING
) PARTITIONED BY (day STRING);

ALTER TABLE lxw1234 ADD PARTITION (day = '2015-07-15')
location 'hdfs://namenode/tmp/lxw1234.com/';```

SELECT * FROM PARTITION_PARAMS WHERE PART_ID = 56807

 PART_ID PARAM_KEY PARAM_VALUE 56807 transient_lastDdlTime 1436919355 56807 numFiles 20 56807 numRows -1 56807 rawDataSize -1 56807 totalSize 376278 56807 COLUMN_STATS_ACCURATE false

ANALYZE TABLE lxw1234 PARTITION (day = ‘2015-07-15′)

COMPUTE STATISTICS;

Partition default.lxw1234{day=2015-07-15} stats:

[numFiles=20, numRows=11067, totalSize=376278, rawDataSize=365211]

SELECT * FROM PARTITION_PARAMS WHERE PART_ID = 56807

 PART_ID PARAM_KEY PARAM_VALUE 56807 transient_lastDdlTime 1436919604 56807 numFiles 20 56807 numRows 11067 56807 rawDataSize 365211 56807 totalSize 376278 56807 COLUMN_STATS_ACCURATE true

Hive相关文章（持续更新）：

Hive分析函数系列

Hive索引

hive优化之——控制hive任务中的map数和reduce数

[一起学Hive]之十五-分析Hive表和分区的统计信息(Statistics)

• 0

开心

• 0

板砖

• 0

感动

• 0

有用

• 0

疑问

• 0

难过

• 0

无聊

• 0

震惊