pyspark中双引号字符的问题

yqkkidmi  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(573)

我的源文件如下所示,我试图在pyspark中阅读此文件以进行进一步的转换。

"ID","FNAME","LNAME","AGE","DESIGNATION"

"1","John","Denver","34","Tech Staff"

"2","Philip","Spencer","30","Tech Staff "CONTRACT""

数据截图为:

代码如下

%pyspark

df = spark.read.csv("s3://emp_bucket/test_files/emp.csv",sep=",",quote='"',header='true')

df.show(truncate=False)

我希望结果如下:

+---+------+-------+---+-----------------------+

|ID |FNAME |LNAME  |AGE|DESIGNATION            |

+---+------+-------+---+-----------------------+

|1  |John  |Denver |34 |Tech Staff             |

|2  |Philip|Spencer|30 |Tech Staff "CONTRACT"|

+---+------+-------+---+-----------------------+

但结果却出乎意料,如下所示:

+---+------+-------+---+-----------------------+

|ID |FNAME |LNAME  |AGE|DESIGNATION            |

+---+------+-------+---+-----------------------+

|1  |John  |Denver |34 |Tech Staff             |

|2  |Philip|Spencer|30 |"Tech Staff "CONTRACT""|

+---+------+-------+---+-----------------------+

我尝试使用转义符,但Pypark无法避免“tech staff”contract“”中的外部双引号。
有人能看看这是不是正确的行为吗?

pgpifvop

pgpifvop1#

如果你看这一行:

"2","Philip","Spencer","30","Tech Staff "CONTRACT""

您将看到最后一列中的引号没有转义。
应该是:

"2","Philip","Spencer","30","Tech Staff \"CONTRACT\""

甚至不加引号(因为内容中没有逗号):

2,Philip,Spencer,30,Tech Staff \"CONTRACT\"

相关问题