我们正在HDI4.0上使用Hive3.1.x集群,其中1个是llap,另一个只是Hive3.1.x集群。
我们在两个集群上都创建了一个托管表,行数为 272409
.
在两个群集上合并之前
+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272409 | 2020-06-15 00:00:12.0 | 2020-07-26 23:42:17.0 |
+---------------------+------------+---------------------+------------------------+------------------------+
``` `Based on the delta, we'd perform a merge operation (which updates 17 rows).` 在hive llap群集上合并后(压缩前)
+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272392 | 2020-06-15 00:00:12.0 | 2020-07-27 22:52:34.0 |
+---------------------+------------+---------------------+------------------------+------------------------+
在hive llap群集上合并后(压缩后)
+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272409 | 2020-06-15 00:00:12.0 | 2020-07-27 22:52:34.0 |
+---------------------+------------+---------------------+------------------------+------------------------+
仅在配置单元群集上合并后(不压缩增量)
+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272409 | 2020-06-15 00:00:12.0 | 2020-07-27 22:52:34.0 |
+---------------------+------------+---------------------+------------------------+------------------------+
这就是观察到的不一致
但是,在hivellap上压缩表之后,结果集不一致并没有出现,两个集群都返回相同的结果。 `We thought it might be due to either caching or llap issue, so we restarted the hive-server2 process which will clear the cache. The issue is still persistent.` We also created a dummy table with same schema on just hive cluster and pointed the location of that table to that of llap one, which in turn is producing result as expected. `We even queried on spark using**Qubole spark-acid reader**(direct hive managed table reader), which is also producing expected result` 这太奇怪了,有人能帮上忙吗。
2条答案
按热度按时间6qqygrtg1#
我们在hdinsight hive llap集群中也遇到了类似的问题。设置时
hive.llap.io.enabled
作为false
解决了这个问题gmxoilav2#
qubole还不支持hive llap(但是,我们(在qubole)正在评估将来是否支持这一点)