星火查询HiveCassandra

nr7wwzry  于 2021-06-28  发布在  Hive
关注(0)|答案(0)|浏览(183)

我在cassandra中有数据,其中key(id)是二进制格式的,我正在尝试在hive表(与longtype具有相同的id字段)之间进行查询。因此,当我在spark sql 2.0.1中运行此查询时:

SELECT a.key
FROM cassandra_table a
WHERE a.key in(cast('829404561643414' as binary));

它会在几秒钟内返回结果。另外,当我运行这个查询时,spark sql:

SELECT cast(cast(id AS string) as binary)
     FROM hive_table c LATERAL VIEW EXPLODE (c.field) pixTable AS pix LATERAL VIEW EXPLODE (pix.fieldList) olTable AS olc
     WHERE (c.hit_date_utc = 20161001) LIMIT 1;

它还会在几秒钟内返回结果。
但是当我做这样的子查询时:

SELECT a.key
    FROM cassandra_table a
    WHERE a.key in(SELECT cast(cast(id AS string) as binary)
         FROM hive_table c LATERAL VIEW EXPLODE (c.field) pixTable AS pix LATERAL VIEW EXPLODE (pix.fieldList) olTable AS olc
         WHERE (c.hit_date_utc = 20161001) LIMIT 1);

它创建了大约40000个spark任务,而job只是继续运行。
知道为什么会这样吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题