hivesql添加排序或分发结果文件大小比以前大

bd1hkmkf 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(250)

我的Hive表都是lzo压缩类型。我有两个这样的配置单元sql：
[1]

set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
insert overwrite table a partition(dt='20160420')
select col1, col2 ... from b where dt='20160420';

因为[1]sql没有reduce，所以它会创建许多小文件。
[2]

set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
insert overwrite table a partition(dt='20160420')
select col1, col2 ... from b where dt='20160420'
  sort by col1;

唯一不同的是最后一行，sql[2]有“sort by”。
数据计数和内容是相同的，但是[2]的文件大小比[1]大，我们的hdfs文件大小几乎是以前的1倍。
你能帮我找到原因吗。

Hive mapreduce hadoop-lzo

来源：https://stackoverflow.com/questions/36744384/hive-sql-add-sort-or-distribute-then-the-result-file-size-bigger-than-before

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

hivesql添加排序或分发结果文件大小比以前大

暂无答案！

相关问题

热门标签

最新问答