我有4列的dataframe,希望将前2列和后2列合并到一个新的dataframe中。
数据是相同的,顺序是无关的,任何重复必须保留。
import pyspark.sql.functions as F
df = spark.createDataFrame([
["This is line 1","xxxx12","This is line 5","hhhh29"],
["This is line 2","yyyy23","This is line 6","kkkk47"],
["This is line 3","zzzz64","This is line 7","llll88"],
["This is line 4","gggg37","This is line 8","ssss84"],
]).toDF("col_a", "col_b", "col_c", "col_d")
新Dataframe:
+---------------+-------+
| col_1 |col_2 |
+-------------- +-------+
|This is line 1 |xxxx12 |
|This is line 5 |hhhh29 |
|This is line 2 |yyyy23 |
|This is line 6 |kkkk47 |
|This is line 3 |zzzz64 |
|This is line 7 |llll88 |
|This is line 4 |gggg37 |
|This is line 8 |ssss84 |
+---------------+-------+
我该怎么做?
1条答案
按热度按时间8e2ybdfx1#
如果顺序不重要,可以使用
unionAll
:或者你可以用
stack
,保持秩序: