在pysparkDataframe中用双引号替换单引号

yquaqz18 于 2021-07-09 发布在 Spark

关注(0)|答案(2)|浏览(607)

从下面的代码，我写了一个Dataframe到csv文件。
因为我的Dataframe包含 "" 为了 None ，我补充道 replace("", None) 因为 Null 值应该表示为 None 而不是 "" （双引号）

newDf.coalesce(1).replace("", None).replace("'", "\"").write.format('csv').option('nullValue', None).option('header', 'true').option('delimiter', '|').mode('overwrite').save(destination_csv)

我试着添加 .replace("'", "\""). 但它不起作用
数据还包含带单引号的数据
如：

Survey No. 123, 'Anjanadhri Godowns', CityName

我需要用双引号替换Dataframe中的单引号。
如何实现？

python apache-spark pyspark apache-spark-sql replace

来源：https://stackoverflow.com/questions/66852532/replace-single-quotes-with-double-quotes-in-pyspark-dataframe

2条答案

按热度按时间

tzcvj98z1#

使用 translate ```
from pyspark.sql.functions import *

data_list = [(1, "'Name 1'"), (2, "'Name 2' and 'Something'")]
df = spark.createDataFrame(data = data_list, schema = ["ID", "my_col"])

+---+--------------------+

| ID| my_col|

+---+--------------------+

| 1| 'Name 1'|

| 2|'Name 2' and 'Som...|

+---+--------------------+

df.withColumn('my_col', translate('my_col', "'", '"')).show()

+---+--------------------+

| ID| my_col|

+---+--------------------+

| 1| "Name 1"|

| 2|"Name 2" and "Som...|

+---+--------------------+

这将在列中用双引号替换所有出现的单引号字符 `my_col` .

赞(0）回复(0）举报 2021-07-09

h5qlskok2#

你可以用 regexp_replace 要在写入输出之前将所有列中的单引号替换为双引号，请执行以下操作：

import pyspark.sql.functions as F

df2 = df.select([F.regexp_replace(c, "'", '"').alias(c) for c in df.columns])

# then write output

# df2.coalesce(1).write(...)

赞(0）回复(0）举报 2021-07-09

我来回答

在pysparkDataframe中用双引号替换单引号

2条答案

+---+--------------------+

| ID| my_col|

+---+--------------------+

| 1| 'Name 1'|

| 2|'Name 2' and 'Som...|

+---+--------------------+

+---+--------------------+

| ID| my_col|

+---+--------------------+

| 1| "Name 1"|

| 2|"Name 2" and "Som...|

+---+--------------------+

相关问题

热门标签

最新问答