python在分区列结果之间剪切

lfapxunr 于 2021-05-19 发布在 Spark

关注(0)|答案(1)|浏览(402)

我在spark scala中使用下面的代码来获得分区列。

scala> val part_cols= spark.sql(" describe extended work.quality_stat ").select("col_name").as[String].collect()
part_cols: Array[String] = Array(x_bar, p1, p5, p50, p90, p95, p99, x_id, y_id, # Partition Information, # col_name, x_id, y_id, "", # Detailed Table Information, Database, Table, Owner, Created Time, Last Access, Created By, Type, Provider, Table Properties, Location, Serde Library, InputFormat, OutputFormat, Storage Properties, Partition Provider)

scala> part_cols.takeWhile( x => x.length()!= 0 ).reverse.takeWhile( x => x != "# col_name" )
res20: Array[String] = Array(x_id, y_id)

我需要在python中获得类似的输出。我正在努力在python中复制相同的代码，以便数组操作获得[y\u id，x\u id]。
下面是我试过的。

>>> part_cols=spark.sql(" describe extended work.quality_stat ").select("col_name").collect()

是否可以使用python。

python apache-spark pyspark python-3.x

来源：https://stackoverflow.com/questions/64391047/python-cut-between-partitioned-column-results

1条答案

按热度按时间

s8vozzvw1#

part_cols 问题是一个行数组。所以第一步是把它转换成一个字符串数组。

part_cols = spark.sql(...).select("col_name").collect()
part_cols = [row['col_name'] for row in part_cols]

现在可以使用

start_index = part_cols.index("# col_name") + 1
end_index = part_cols.index('', start_index)

最后，可以从列表中提取一个片段，将这两个值作为start和end

part_cols[start_index:end_index]

此切片将包含值

['x_id', 'y_id']

如果输出真的应该反转，切片

part_cols[end_index-1:start_index-1:-1]

将包含值

['y_id', 'x_id']

赞(0）回复(0）举报 2021-05-20

我来回答

python在分区列结果之间剪切

1条答案

相关问题

热门标签

最新问答