hive 在S3中没有看到文件级下推 predicate 过滤查询配置单元分区表

r7xajy2e 于 2023-10-18 发布在 Hive

关注(0)|答案(1)|浏览(182)

我在DuckDB-WASM中使用DuckDB。我在S3中使用SQL在一个分区表的顶部创建一个视图，如下所示：

create or replace view my_view as
select
    Part1 as part_1
  , Part2 as part_2
  , Column1 as column_1
  , Column2 as column_2
  from read_parquet(
    [
      's3://my-bucket/path/to/part1=abc/part2=123/000.parquet',
      's3://my-bucket/path/to/part1=def/part2=456/000.parquet',
      's3://my-bucket/path/to/part1=ghi/part2=789/000.parquet'
    ],
    hive_partitioning=1)

然后执行一个查询，如下所示：
select count(*) from my_view where part1 = 'abc' and part2 = '123'
我希望DuckDB使用下推 predicate 来只读s3://my-bucket/path/to/part1=abc/part2=123/000.parquet文件。相反，我看到Chrome调试工具网络选项卡读取所有三个文件。

Hive

来源：https://stackoverflow.com/questions/76999932/not-seeing-file-level-pushdown-predicate-filtering-querying-hive-partitioned-tab

1条答案

按热度按时间

1bqhqjot1#

我想我知道了。S3前缀区分大小写。改变观点似乎已经为我解决了这个问题。

create or replace view my_view as
select
    part1 as part_1
  , part2 as part_2
  , Column1 as column_1
  , Column2 as column_2
  from read_parquet(
    [
      's3://my-bucket/path/to/part1=abc/part2=123/000.parquet',
      's3://my-bucket/path/to/part1=def/part2=456/000.parquet',
      's3://my-bucket/path/to/part1=ghi/part2=789/000.parquet'
    ],
    hive_partitioning=1)

赞(0）回复(0）举报 2023-10-18

我来回答

hive 在S3中没有看到文件级下推 predicate 过滤查询配置单元分区表

1条答案

相关问题

热门标签

最新问答