如何在pyspark中以编程方式使用“count”?

x759pob2  于 2021-07-09  发布在  Spark
关注(0)|答案(2)|浏览(406)

试图在pyspark中以编程方式进行简单的计数,但会出现错误。 .count() 如果我放弃的话,它在语句末尾起作用 AS (count(city)) 但我需要伯爵出现在里面而不是外面。

result = spark.sql("SELECT city AS (count(city)) AND business_id FROM business WHERE city = 'Reading'")

许多错误之一

Py4JJavaError: An error occurred while calling o24.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '(' expecting ')'(line 1, pos 21)

== SQL ==
SELECT city AS (count(city)) AND business_id FROM business WHERE city = 'Reading'
---------------------^^^
fgw7neuy

fgw7neuy1#

你的语法不正确。也许你想这样做:

result = spark.sql("""
    SELECT 
        count(city) over(partition by city), 
        business_id 
    FROM business 
    WHERE city = 'Reading'
""")

你需要提供一个窗口,如果你使用 count 没有 group by . 在这种情况下,您可能需要每个城市的计数。

bcs8qyzn

bcs8qyzn2#

只是我想解决的问题的解决方案。上面的解决方案正是我想要的。

result = spark.sql("SELECT count(*) FROM business WHERE city='Reading'")

相关问题