从4.6版本起,Phoenix支持一种时间戳映射到列的方法(即HBase中 column 的RowTimestamp),以便对Phoenix在存储时序数据时查询的优化
Row Timestamp介绍
在建表时对特定列指定
ROW_TIMESTAMP
关键字:
- 只有当parimary key为以下类型时才可以使用
ROW_TIMESTAMP
:TIME, DATE, TIMESTAMP, BIGINT, UNSIGNED_LONG
- 若primary key为组合主键的时候,只有一个column能指定为
ROW_TIMESTAMP
- 此列的值不能为null
- 此列的值不能为负
当使用UPSERT VALUES or UPSERT SELECT语句更新数据时,可以指定ROW_TIMESTAMP
的值,如果不指定则默认以服务器时间为此列的时间戳,同时这个时间戳也对应hbase 中row的timestamp
当查询过滤ROW_TIMESTAMP
时,不仅可以做常规的查询优化,同时可以依靠timestamp对数据进行最大最小的范围优化,hbase服务器端可以直接跳过不在时间区间内的hfile文件从而使扫描效率提高,尤其是查询数据尾端的时候
SoftWare | Version |
---|---|
Hbase | 1.2.0-cdh5.11.1 |
Phoenix | 4.13.0-cdh5.11.1 |
Java | 1.8.0_121 |
部署环境:4 Region Servers
测试用例
1 | CREATE TABLE phoenix_dev.test_ywzx_wuhan_switch ( |
1 | UPSERT INTO PHOENIX_DEV.test_ywzx_wuhan_switch ( |
- primary key 组成:CONSTRAINT pk PRIMARY KEY (event_time ASC, “_KEY” ASC)
建立一张表结构完全一样的表test_ywzx_wuhan_switch_rowtimestap,给event_time字段添加ROW_TIMESTAMP
即给时间主键设置ROW_TIMESTAMP
- 数据量及大小
2000w条数据
压缩后size:
3.5 G 10.4 G /hbase/data/PHOENIX_DEV/TEST_YWZX_WUHAN_SWITCH_ROWTIMESTAMP
event_time时间范围: 2018-10-21 14:03:34.503 —— 2018-12-06 21:10:14.303
自动split一次,由于salt = 4,所以写入始终向4 Region Servers,查询旧数据则扫描前4个RS,查询新数据扫描后4个RS,全表查询则使用8个RS
select count(*) from PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time <= to_timestamp('2018-12-05 00:00:00')
size = 1900w+
测试语句一
无ROW_TIMESTAMP
1 | select count(*) from PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH; |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 6.159 seconds |
2 | 5.537 seconds |
3 | 4.686 seconds |
4 | 4.429 seconds |
5 | 4.943 seconds |
有ROW_TIMESTAMP
1 | select count(*) from PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH_ROWTIMESTAMP; |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 21.726 seconds |
2 | 20.906 seconds |
3 | 19.494 seconds |
4 | 19.514 seconds |
5 | 19.478 seconds |
测试语句二
无ROW_TIMESTAMP
1 | select count(SWITCH_HOST),SWITCH_HOST,count(SWITCH_POWER_STATUS),SWITCH_POWER_STATUS from PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time <= to_timestamp('2018-12-05 00:00:00') group by SWITCH_HOST,SWITCH_POWER_STATUS limit 5000; |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 5.714 seconds |
2 | 5.355 seconds |
3 | 4.98 seconds |
4 | 4.922 seconds |
5 | 5.02 seconds |
有ROW_TIMESTAMP
1 | select count(SWITCH_HOST),SWITCH_HOST,count(SWITCH_POWER_STATUS),SWITCH_POWER_STATUS from PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH_ROWTIMESTAMP where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time <= to_timestamp('2018-12-05 00:00:00') group by SWITCH_HOST,SWITCH_POWER_STATUS limit 5000; |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 19.391 seconds |
2 | 18.249 seconds |
3 | 18.195 seconds |
4 | 18.567 seconds |
5 | 18.247 seconds |
测试语句三
无ROW_TIMESTAMP
1 | SELECT * FROM (SELECT CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time, switch_host, MAX(switch_cpu_utilization) AS cpu_utilization FROM PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH AS ywzx_wuhan_switch WHERE switch_gims_measurement = 'ddd' AND switch_host = 'fff' AND ywzx_wuhan_switch.event_time >= to_timestamp('2018-10-21 00:00:00') AND ywzx_wuhan_switch.event_time <= to_timestamp('2018-12-05 00:00:00') GROUP BY log_time, switch_host ORDER BY switch_host,log_time desc) LIMIT 5000 offset 0; |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 6.917 seconds |
2 | 5.914 seconds |
3 | 5.929 seconds |
4 | 6.091 seconds |
5 | 6.135 seconds |
有ROW_TIMESTAMP
1 | SELECT * FROM (SELECT CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time, switch_host, MAX(switch_cpu_utilization) AS cpu_utilization FROM PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH_ROWTIMESTAMP AS ywzx_wuhan_switch WHERE switch_gims_measurement = 'ddd' AND switch_host = 'fff' AND ywzx_wuhan_switch.event_time >= to_timestamp('2018-10-21 00:00:00') AND ywzx_wuhan_switch.event_time <= to_timestamp('2018-12-05 00:00:00') GROUP BY log_time, switch_host ORDER BY switch_host,log_time desc) LIMIT 5000 offset 0; |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 19.875 seconds |
2 | 20.301 seconds |
3 | 18.88 seconds |
4 | 19.706 seconds |
5 | 19.302 seconds |
测试语句四
无ROW_TIMESTAMP
1 | SELECT count(*) FROM (SELECT CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time, switch_host, MAX(switch_cpu_utilization) AS cpu_utilization FROM PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH AS TEST_YWZX_WUHAN_SWITCH WHERE switch_gims_measurement = 'ddd' AND switch_host = 'fff' AND TEST_YWZX_WUHAN_SWITCH.event_time >= to_timestamp('2018-10-21 00:00:00') AND TEST_YWZX_WUHAN_SWITCH.event_time <= to_timestamp('2018-12-05 00:00:00') GROUP BY CAST(floor(event_time, 'MINUTE', 1) AS timestamp), switch_host); |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 6.888 seconds |
2 | 5.896 seconds |
3 | 5.263 seconds |
4 | 5.576 seconds |
5 | 5.665 seconds |
有ROW_TIMESTAMP
1 | SELECT count(*) FROM (SELECT CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time, switch_host, MAX(switch_cpu_utilization) AS cpu_utilization FROM PHOENIX_DEV.TEST_YWZX_WUHAN_SWITCH_ROWTIMESTAMP AS TEST_YWZX_WUHAN_SWITCH WHERE switch_gims_measurement = 'ddd' AND switch_host = 'fff' AND TEST_YWZX_WUHAN_SWITCH.event_time >= to_timestamp('2018-10-21 00:00:00') AND TEST_YWZX_WUHAN_SWITCH.event_time <= to_timestamp('2018-12-05 00:00:00') GROUP BY CAST(floor(event_time, 'MINUTE', 1) AS timestamp), switch_host); |
1 | // explain |
第N次执行 | 耗时 |
---|---|
1 | 19.967 seconds |
2 | 18.854 seconds |
3 | 19.637 seconds |
4 | 18.701 seconds |
5 | 18.986 seconds |
[JIRA 问题追踪](https://issues.apache.org/jira/browse/PHOENIX-5157)
转载请注明出处:https://github.com/imperio-wxm