spark phoenix连接,sql查询在日期列上有筛选器时出现问题

j0pj023g  于 2021-06-08  发布在  Hbase
关注(0)|答案(1)|浏览(246)

根据Phoenix城官方网站的建议,我已经从spark连接到Phoenix城。简单的select查询没有任何问题,但是当我尝试使用filter-on-date列运行查询时,会出现一些错误。
下面是示例代码

Map<String, String> map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", "TABLE_1");
Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df.registerTempTable("TABLE_1");
// This query works without any error
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT COUNT(1) AS ROW_COUNT 
FROM TABLE_1 WHERE TEXT_COLUMN_1 = 'ABC' ");

但是当我运行带有日期列过滤器的查询时,它会给我错误

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN to_date('2015-01-02') AND to_date('2016-12-30') ");

尝试了许多不同的方法来提供如下所述的日期格式

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN cast('2015-01-02' as date) AND cast('2015-01-02' as date) ");

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 <= cast('2015-01-02' as date) AND DATE_COLUMN_1 >= cast('2015-01-02' as date) ");

错误message:-

18/01/23 17:05:26 INFO PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=1, selectColumnList=DATE_COLUMN_1 
18/01/23 17:05:26 INFO PhoenixInputFormat: Select Statement: SELECT "DATE_COLUMN_1" FROM TABLE_1 WHERE ( DATE_COLUMN_1 IS NOT NULL AND DATE_COLUMN_1 >= 2015-01-02 AND DATE_COLUMN_1 <= 2016-12-30)
18/01/23 17:05:26 ERROR PhoenixInputFormat: Failed to get the query plan with error [ERROR 203 (22005): Type mismatch. DATE and BIGINT for DATE_COLUMN_1 >= 2012]
Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1671L])
   +- *Project
      +- *Filter ((isnotnull(DATE_COLUMN_1#997) && (DATE_COLUMN_1#997 >= 16437)) && (DATE_COLUMN_1#997 <= 17165))
         +- *Scan PhoenixRelation(TABLE_1,localhost:2181,false) [DATE_COLUMN_1#997] PushedFilters: [IsNotNull(DATE_COLUMN_1), GreaterThanOrEqual(DATE_COLUMN_1,2015-01-02), LessThanOrEqual(TR..., ReadSchema: struct<>
k5hmc34c

k5hmc34c1#

在我删除了to_date()函数之后,它就开始工作了。

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN '2015-01-02' AND '2016-12-30' ");

相关问题