python 在PySpark中按降序排序

af7jpaap 于 2022-12-17 发布在 Python

关注(0)|答案(7)|浏览(344)

我使用的是PySpark（Python 2.7.9/Spark 1.3.1），有一个 Dataframe GroupObject，我需要对它进行降序过滤和排序，尝试通过这段代码来实现。

group_by_dataframe.count().filter("`count` >= 10").sort('count', ascending=False)

但它会引发以下错误。

sort() got an unexpected keyword argument 'ascending'

python

来源：https://stackoverflow.com/questions/34514545/sort-in-descending-order-in-pyspark

7条答案

按热度按时间

polhcujo1#

在PySpark 1.3中，sort方法不带升序参数，可以使用desc方法：

from pyspark.sql.functions import col

(group_by_dataframe
    .count()
    .filter("`count` >= 10")
    .sort(col("count").desc()))

或desc函数：

from pyspark.sql.functions import desc

(group_by_dataframe
    .count()
    .filter("`count` >= 10")
    .sort(desc("count"))

这两种方法都可以用于Spark〉= 1.3（包括Spark 2.x）。

赞(0）回复(0）举报 2022-12-17

2g32fytz2#

使用排序依据：

df.orderBy('column_name', ascending=False)

完整答案：

group_by_dataframe.count().filter("`count` >= 10").orderBy('count', ascending=False)

http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html

赞(0）回复(0）举报 2022-12-17

zyfwsgd63#

到目前为止，最方便的方法是使用这个：

df.orderBy(df.column_name.desc())

不需要特殊的进口。

赞(0）回复(0）举报 2022-12-17

bwitn5fc4#

还可以按如下方式使用groupBy和orderBy

dataFrameWay = df.groupBy("firstName").count().withColumnRenamed("count","distinct_name").sort(desc("count"))

赞(0）回复(0）举报 2022-12-17

3qpi33ja5#

在pyspark 2.4.4中

1) group_by_dataframe.count().filter("`count` >= 10").orderBy('count', ascending=False)

2) from pyspark.sql.functions import desc
   group_by_dataframe.count().filter("`count` >= 10").orderBy('count').sort(desc('count'))

无需导入1），1）简短易读，

所以我更喜欢1）而不是2）

赞(0）回复(0）举报 2022-12-17

dy2hfwbg6#

RDD.排序方式（键函数，升序=真，分区数=无）
举个例子：

words =  rdd2.flatMap(lambda line: line.split(" "))
counter = words.map(lambda word: (word,1)).reduceByKey(lambda a,b: a+b)

print(counter.sortBy(lambda a: a[1],ascending=False).take(10))

赞(0）回复(0）举报 2022-12-17

6ju8rftf7#

有两种方法-
使用排序df.sort（'<col_name>'，升序= False）
使用排序方式df.orderBy（'<col_name>'）.desc（）

赞(0）回复(0）举报 2022-12-17