如何使用pyspark将表格格式的数据转换成句子或可读格式？

3xiyfsfu 于 2021-06-10 发布在 Cassandra

关注(0)|答案(1)|浏览(376)

这是图像中的表格式，所以我应该如何将其转换为可读格式，就像它应该显示为-member\u id is belowns to region，等等其他列一样
那么，有谁能帮我写一个函数，把表格格式的数据转换成可读的句子格式吗？

cassandra apache-spark pyspark apache-spark-sql spark-cassandra-connector

来源：https://stackoverflow.com/questions/60716847/how-to-convert-tabular-format-data-to-sentence-or-readable-format-using-pyspark

1条答案

按热度按时间

bfrts1fy1#

您可以添加名为“”的新列 Sentence “如下所示，并使用 concat 功能。我也写df到一个文件，如果你想它到csv文件。

>>> from pyspark.sql.functions import *
>>> df.show()
+-----+---------+---+----+
|fname|    lname|age|dept|
+-----+---------+---+----+
| Jack|  Felice | 25|  IT|
| Mike| Gilbert | 30|  CS|
| John|     Shen| 45|  DR|
+-----+---------+---+----+

>>> df1 = df.withColumn("sentence", concat( col("fname"), lit(" "), col("lname"), lit("is "), col("age"), lit(" year's old and he works in a "), col("dept"), lit(" department."))).select("sentence")
>>> df1.show(10,False)
+---------------------------------------------------------------+
|sentence                                                       |
+---------------------------------------------------------------+
|Jack Felice is 25 year's old and he works in a IT department.  |
|Mike  Gilbert is 30 year's old and he works in a CS department.|
|John Shenis 45 year's old and he works in a DR department.     |
+---------------------------------------------------------------+

>>> df1.write.format("csv").option("header", "true").save("/out/")

csv输出

赞(0）回复(0）举报 2021-06-10

我来回答

如何使用pyspark将表格格式的数据转换成句子或可读格式？

1条答案

相关问题

热门标签

最新问答