java—配置单元中的nullif在某些版本中是否存在一些已知的实现问题?

nbysray5  于 2021-06-25  发布在  Hive
关注(0)|答案(2)|浏览(291)

我使用的是emr5.19hive2.3.3,如果不能从java字符串转换为hadoop文本,或者反之亦然,我会遇到nullif的问题。源代码是来自aws的cloudtrail serde,它似乎写得很扎实。问题似乎来自内置的nullif udf,正如您在错误消息中看到的:
我正在测试regexp提取的结果是否为空字符串,如果是,我希望为null,因此我的列看起来有点像 NULLIF(REGEXP_EXTRACT(key,'([^\/]+)(\/\d+)?(\/.*)', 1), '') AS key_prefix 但我得到的错误如下:

2020-02-11 11:06:34,034 INFO [IPC Server handler 26 on 43627] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1574116917806_1754132_r_000008_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:93)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:820)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:834)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:837)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:938)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:264)
    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:196)
    ... 7 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.comparePrimitiveObjects(PrimitiveObjectInspectorUtils.java:421)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:93)
dwbf0jvd

dwbf0jvd1#

将nullif替换为执行nullif所说的操作的宏,避免了异常,并且工作正常:

CREATE TEMPORARY MACRO NULLIFY(s string) if(s = '', null, s);
SELECT
  NULLIFY(REGEXP_EXTRACT(LSRC.key, '(^[^\\/]*)\\/(\\d\\/)?([^\\/][^\\/]+)', 1)) AS a_schema_name,
  NULLIFY(REGEXP_EXTRACT(LSRC.key, '(^[^\\/]*)\\/(\\d\\/)?([^\\/][^\\/]+)', 3)) AS a_table_name,
  NULLIFY(REGEXP_EXTRACT(LSRC.key, '(^[^/]*)/(\\d/)?([^/][^/]+)', 1)) AS b_schema_name,
  NULLIFY(REGEXP_EXTRACT(LSRC.key, '(^[^/]*)/(\\d/)?([^/][^/]+)', 3)) AS b_table_name,
  NULLIFY(REGEXP_EXTRACT(LSRC.key, "(^[^/]*)/(\d/)?([^/][^/]+)", 1)) AS c_schema_name,
  NULLIFY(REGEXP_EXTRACT(LSRC.key, "(^[^/]*)/(\d/)?([^/][^/]+)", 3)) AS c_table_name,
  NULLIFY(LSRC.key) AS key
FROM (SELECT 'db/1/table/part/file' key) LSRC;

导致:

db      table   db      table   NULL    NULL    db/1/table/part/file

使用不匹配的字符串时:

NULL    NULL    NULL    NULL    NULL    NULL    non-matching

而不是用空字符串产生不方便的结果

db/1/table/part/file
qyuhtwio

qyuhtwio2#

也许不能直接回答你的问题,但希望这会有所帮助。 regexp_extract 返回空字符串 '' 如果regex不匹配,它可以返回 null 仅当源字符串为 null . 所以,在这里使用nullif看起来不正确
使用双反斜杠屏蔽特殊字符,如 \\d 在Hiveregexp中。 / -不是特殊字符,不需要转义/屏蔽。
我建议如下:

CREATE TEMPORARY MACRO normalize_null(s string) CASE WHEN s!='' THEN s END;

它将把空字符串转换成空字符串、空字符串和其他所有原样。

相关问题