如何将下面这样的pyspark Dataframe 转换为json数组结构
OrderID field fieldValue itemSeqNo
123 Date 01-01-23 1
123 Amount 10.00 1
123 description Pencil 1
123 Date 01-02-23 2
123 Amount 11.00 2
123 description Pen 2
字符串
下面的JSON数组结构
{
"orderDetails": {
"orderID": "123"
},
"itemizationDetails": [
{
"Date": "01-01-23",
"Amount": "10.00",
"description": "Pencil"
},
{
"Date": "01-02-23 ",
"Amount": "11.00",
"description": "Pen"
}
]
}
型
这是我目前的代码,输出并不像预期的那样。
import pandas as pd
test_dataframe = pd.DataFrame(
{
"OrderID" : ['123','123','123','123','123','123'],
"field" :
["Date","Amount",'description','Date','Amount','description'],
"fieldValue": ['01-01-23','10.00','Pencil','01-02-23
','11.00','Pen '],
"itemSeqNo" : ['1','1','1','2','2','2']
}
)
import json
res = json.loads(test_dataframe.to_json(orient='records'))
print(res)
[{'OrderID': '123', 'field': 'Date', 'fieldValue': '01-01-23', 'itemSeqNo': '1'}, {'OrderID': '123', 'field': 'Amount', 'fieldValue': '10.00', 'itemSeqNo': '1'}, {'OrderID': '123', 'field': 'description', 'fieldValue': 'Pencil', 'itemSeqNo': '1'}, {'OrderID': '123', 'field': 'Date', 'fieldValue': '01-02-23 ', 'itemSeqNo': '2'}, {'OrderID': '123', 'field': 'Amount', 'fieldValue': '11.00', 'itemSeqNo': '2'}, {'OrderID': '123', 'field': 'description', 'fieldValue': 'Pen ', 'itemSeqNo': '2'}]
型
1条答案
按热度按时间cuxqih211#
Pyspark解决方案
轴心重塑框架
字符串
将所需列打包到结构类型中
型
按OrderID对框架进行分组并收集结构列表
型
将OrderID打包到结构字段中
型
将字符串导出为JSON
型