我有一个配置单元表作为源表。我还有一个Hive表作为目标。源表和目标表的ddl都是相同的,只是在目标表中添加了一些日志列。以下是ddls:source:
CREATE EXTERNAL TABLE source.customer_detail(
id string,
name string,
city string,
properties_owned array<struct<property_addr:string, location:string>>
)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
LOCATION
'/user/aiman/customer_detail';
目标:
CREATE EXTERNAL TABLE target.customer_detail(
id string,
name string,
city string,
properties_owned array<struct<property_addr:string, location:string>>
audit_insterted_ts timestamp,
audit_dml_action char(1)
)
PARTITIONED BY (audit_active_flag char(1))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS ORC
LOCATION
'/user/aiman/target/customer_detail';
数据源:
+---------------------+--------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
| customer_detail.id | customer_detail.name | customer_detail.city | customer_detail.properties_owned |
+---------------------+--------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Aiman Sarosh | kolkata | [{"property_addr":"H1 Block Saltlake","location":"kolkata"},{"property_addr":"New Property Added Saltlake","location":"kolkata"}] |
| 2 | Justin | delhi | [{"property_addr":"some address in delhi","location":"delhi"}] |
+---------------------+--------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
目标数据:
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| customer_detail.id | customer_detail.name | customer_detail.city | customer_detail.properties_owned | customer_detail.audit_insterted_ts | customer_detail.audit_dml_action | customer_detail.audit_active_flag |
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| 1 | Aiman Sarosh | kolkata | [{"property_addr":"H1 Block Saltlake","location":"kolkata"}] | 2018-09-04 06:55:12.361 | I | A |
| 2 | Justin | delhi | [{"property_addr":"some address in delhi","location":"delhi"}] | 2018-09-05 08:36:39.023 | I | A |
+---------------------+--------------------------+-------------------------+---------------------------------------------------------------------------------------------------------+-----------------------------------+------------------------------------+
当我运行下面的查询时,它应该会获取1条已修改的记录,即:
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| customer_detail.id | customer_detail.name | customer_detail.city | customer_detail.properties_owned | customer_detail.audit_insterted_ts | customer_detail.audit_dml_action | customer_detail.audit_active_flag |
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| 1 | Aiman Sarosh | kolkata | [{"property_addr":"H1 Block Saltlake","location":"kolkata"},{"property_addr":"New Property Added Saltlake","location":"kolkata"}] | 2018-09-05 07:15:10.321 | U | A |
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
基本上, {"property_addr":"New Property Added Saltlake","location":"kolkata"}
元素已添加到数组列 properties_owned
在 source
.
查询:
SELECT --fetch modified/updated records in source
source.id AS id,
source.name AS name,
source.city AS city,
source.properties_owned AS properties_owned,
current_timestamp() AS audit_insterted_ts,
'U' AS audit_dml_action,
'A' AS audit_active_flag
FROM source.customer_detail source
INNER JOIN target.customer_detail jrnl
ON source.id=jrnl.id
WHERE source.name!=jrnl.name
OR source.city!=jrnl.city
OR source.properties_owned!=jrnl.properties_owned
但这是一个错误:
Error: Error while compiling statement: FAILED: SemanticException [Error 10016]: Line 14:3 Argument type mismatch 'properties_owned': The 1st argument of NOT EQUAL is expected to a primitive type, but list is found (state=42000,code=10016)
在使用联接时,如何比较where子句中具有复杂数据类型的两列?
我可以用 .POS
以及 .ITEM
但这不会有帮助,因为我的列是一个结构数组,数组的长度可以不同。
3条答案
按热度按时间6ss1mwsb1#
处理复杂类型的一种方法是将它们转换为字符串,例如json string。有一个brickhouse项目,其中包含有用的第三方hive udf。是的
to_json
函数,可以将任何复杂类型转换为json字符串。首先,克隆并构建jar:然后将brickhouse jar复制到hdfs并将jar添加到hive中:
注册
to_json
Hive中的自定义项现在你可以用它了,例如。,
因此,在您的情况下,您需要将列转换为json字符串,然后在中进行比较
where
条款。请记住to_json
按原样转换复杂值。例如,在您的示例中,两个数组以及
会有所不同。
eiee3dmh2#
我用这个修好了
LATERAL VIEW explode()
.然后用
concat_ws()
结合collect_list(array<string>)
方法,这最终给了我一个string
我比较了一下:希望有人觉得这很有用
r55awzrz3#
问题:您正在尝试比较列表而不是基元类型
当前情况:无法直接比较复杂对象列表和内置配置单元UDF(字符串列表有一些解决方法)。
解决方法:您需要一些第三方自定义项来帮助您解决这个问题。这里有几个有趣的自定义项(我以前没有测试过)