lzo文件在s3上的问题

dly7yett 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(437)

我有3个lzo压缩文件和它们在hdfs中对应的索引文件。

Permission  Owner   Group   Size    Replication Block Size  Name
-rw-r--r--  alum    supergroup  0 B 3   128 MB  _SUCCESS
-rw-r--r--  alum    supergroup  192.29 MB   3   128 MB  part-00000.lzo
-rw-r--r--  alum    supergroup  89.56 KB    3   128 MB  part-00000.lzo.index
-rw-r--r--  alum    supergroup  243.09 MB   3   128 MB  part-00001.lzo
-rw-r--r--  alum    supergroup  106.67 KB   3   128 MB  part-00001.lzo.index
-rw-r--r--  alum    supergroup  163.99 MB   3   128 MB  part-00002.lzo
-rw-r--r--  alum    supergroup  70.54 KB    3   128 MB  part-00002.lzo.index

我们将这些文件复制到amazons3，并创建用于分析的hive外部表。
以下是我们面临的问题，

1) LZO index files are also being treated as data files and meaningless data appears in hive tables
2) "count(*)" query on the table spans only 4 mappers. Indicating problem in splitting.

你能告诉我发生了什么事吗？它在我们的纱团里很好用。

hadoop Hive amazon-s3 amazon-web-services hadoop-lzo

来源：https://stackoverflow.com/questions/34086649/lzo-files-issue-on-s3