我尝试将列转置为行,并将其加载到数据库中。我的输入是Json文件。
{"09087":{"values": ["76573433","2222322323","768346865"],"values1": ["7686548898","33256768","09864324567"],"values2": ["234523723","64238793333333","75478393333"],"values3": ["87765","46389333","9234689677"]},"090881": {"values": ["76573443433","22276762322323","7683878746865"],"values1": ["768637676548898","3398776256768","0986456834324567"],"values2": ["23877644523723","64238867658793333333","754788776393333"],"values3": ["87765","46389333","9234689677"]}}
Pyspark:df = spark.read.option("multiline", "true").format("json").load("testfile.json")
Schema:
root
|-- 09087: struct (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values1: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values2: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values3: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- 090881: struct (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values1: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values2: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values3: array (nullable = true)
| | |-- element: string (containsNull = true)
数据来源:
df.show()
+--------------------+--------------------+
| 09087| 090881|
+--------------------+--------------------+
|{[76573433, 22223...|{[76573443433, 22...|
+--------------------+--------------------+
输出:
Name values values1 values2 values3
09087 76573433 7686548898 234523723 87765
09087 2222322323 33256768 64238793333333 9234689677
09087 768346865 09864324567 75478393333 46389333
090881 76573443433 768637676548898 23877644523723 87765
090881 22276762322323 3398776256768 64238867658793333333 46389333
090881 7683878746865 0986456834324567 754788776393333 9234689677
实际上我只是给了2列作为输入,但我有很多他们。我一直在尝试这个-有人能请帮助我在这一点上。提前感谢。
2条答案
按热度按时间mum43rcc1#
Pyspark翻译了我的scala解决方案:
k7fdbhmy2#
有关arrays_zip更多信息,请参见此处。