使用spark的sql分析函数

vs3odd8k  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(412)

我有如下sql

SELECT LIMIT, 
       COL1, 
       COL2, 
       COL3
 FROM   
(SELECT ROW_NUMBER () OVER (ORDER BY COL5 DESC) AS LIMIT, 
        FROM_UNIXTIME(COL_DATETIME,'dd-MM-yyyy HH24:mi:ss') COL1,
        CASE WHEN COL6 IN ('A', 'B') THEN A_NUMBER ELSE B_NUMBER END AS COL2, 
        COL3
 FROM   DBNAME.TABLENAME 
WHERE   COL7 LIKE ('123456%')  
  AND   COL_DATETIME BETWEEN 20150201000000 AND 20150202235959) X

我可以在Hive中成功地执行它。但我想从spark开始执行。我创建了一个sparksql配置单元上下文,如下所示

scala> val sqlHContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlHContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@71138de5

然后我尝试执行上面的sql查询,如下所示

sqlHContext.sql("SELECT LIMIT, COL1, COL2, COL3 FROM (SELECT ROW_NUMBER () OVER (ORDER BY COL5 DESC) AS LIMIT, FROM_UNIXTIME(COL_DATETIME,'dd-MM-yyyy HH24:mi:ss') COL1, CASE WHEN COL6 IN ('A', 'B') THEN A_NUMBER ELSE B_NUMBER END AS COL2, COL3 FROM DBNAME.TABLENAME WHERE  COL7 LIKE ('123456%')  AND COL_DATETIME BETWEEN 20150201000000 AND 20150202235959) X").collect().foreach(println)

但是得到了错误

org.apache.spark.sql.AnalysisException: 
Unsupported language features in query:

scala.NotImplementedError: No parse rules for ASTNode type: 882, text: TOK_WINDOWSPEC :
TOK_WINDOWSPEC 1, 90,98, 339
  TOK_PARTITIONINGSPEC 1, 91,97, 339
    TOK_ORDERBY 1, 91,97, 339
      TOK_TABSORTCOLNAMEDESC 1, 95,97, 339
        TOK_TABLE_OR_COL 1, 95,95, 339
          CALL_DATETIME 1, 95,95, 339
" +

org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1261)

似乎不支持分析函数。我使用的是spark版本1.3.0;配置单元版本1.1.0和hadoop版本2.7.0
有没有其他方法可以通过spark实现这一点?

8yoxcaq7

8yoxcaq71#

从spark 1.4.0开始支持窗口功能。仍有一些限制,例如,尚不支持行之间的限制。例如,看看这篇关于spark窗口函数的博客文章。

相关问题