我有一本python字典,如下所示:
data = [{"cust_decision": "buy", "cust_details": "Easy to use"}, {"cust_decision": "buy", "cust_details": "econoimical"}, {"cust_decision":"no buy", "cust_details": "Didn’t like Product"}]
我正在创建一个pyspark df和temp视图,如下所示:
from pyspark.sql import SparkSession, Row
spark.createDataFrame([Row(**i) for i in data]).createOrReplaceTempView("cust")
现在,当我看到这个临时视图的数据时,特殊字符'(这不是一个单引号'it's')被更改为一个不同的字符â. 下面是结果
spark.table("cust").show(10,False)
+-------------+---------------------+
|cust_decision|cust_details |
+-------------+---------------------+
|buy |Easy to use |
|buy |econoimical |
|no buy |Didn’t like Product|
+-------------+---------------------+
但我想得到每个值中的角色。我怎样才能做到??预期结果如下:
+-------------+---------------------+
|cust_decision|cust_details |
+-------------+---------------------+
|buy |Easy to use |
|buy |econoimical |
|no buy |Didn’t like Product |
+-------------+---------------------+
谢谢。。
1条答案
按热度按时间xdyibdwo1#
试用者
decoding
您的数据字典utf-8
```data = [{"cust_decision": "buy", "cust_details": "Easy to use"}, {"cust_decision": "buy", "cust_details": "econoimical"}, {"cust_decision":"no buy", "cust_details": "Didn’t like Product"}]
decode_data=[{k: v.decode("utf-8") for k,v in i.items() } for i in data]
from pyspark.sql import SparkSession, Row
spark.createDataFrame([Row(**i) for i in decode_data]).createOrReplaceTempView("cust")
spark.table("cust").show(10,False)
+-------------+-------------------+
|cust_decision|cust_details |
+-------------+-------------------+
|buy |Easy to use |
|buy |econoimical |
|no buy |Didn’t like Product|
+-------------+-------------------+