hadoop 如何删除分区配置单元表中的重复数据?

gopyfrb3  于 2023-08-03  发布在  Hadoop
关注(0)|答案(1)|浏览(277)

必须删除2023-03-26至2023-07-10的重复数据。我尝试使用此命令从表中删除重复项,但出现错误。
命令:

  1. set hive.exec.dynamic.partition.mode=nonstrict; INSERT OVERWRITE TABLE db.table_name PARTITION(dt) select distinct * from db.table_name where dt >= '2023-03-26' AND dt >= '2023-07-10';

字符串
错误代码:

  1. 23/07/26 16:07:46 [LocalJobRunner Map Task Executor #0]: WARN io.CombineHiveRecordReader: Multiple partitions found; not going to pass a part spec to LLAP IO: {{dt=2023-07-10}} and {{dt=2023-07-11}} 2023- 07-26 16:07:47,952 Stage-1 map = 0%, reduce = 0% 23/07/26 16:07:47 [aabca681-0714-44f6-bc8d-9be6d7fca9fc main]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead.


注意:此表的分区仅为日期。示例如下:

  1. show partitions db.table_name;
  2. dt=2023-07-04
  3. dt=2023-07-05
  4. dt=2023-07-06
  5. dt=2023-07-07
  6. dt=2023-07-08
  7. dt=2023-07-09
  8. dt=2023-07-10
  1. $ hive --version
  2. Hive 2.3.3

的字符串
希望你能在这一点上提出建议。谢谢你,谢谢

ijnw1ujt

ijnw1ujt1#

你是说这个吗

  1. INSERT OVERWRITE TABLE db.table_name PARTITION(dt)
  2. SELECT DISTINCT *
  3. FROM db.table_name
  4. WHERE dt BETWEEN '2023-03-26' AND '2023-07-10';

字符串

相关问题