我有一个带有游戏的数据框和三个来自不同评论的每个游戏的值,每个值都在另一个数据框中转换,如您所见:
Df_reviews
+--------+-------+-------+--------+
|Game | rev_1 | rev_2 | rev_3 |
+- ------+-------+-------+--------+
|CA |XX+ | K2 | L1 |
|FT |Z- | K1+ | L3 |
Df_rev1
+----------+-------------+
| review_1 | Equivalence |
+----------+-------------+
|XX+ | 9 |
|Y | 6 |
|Z- | 3 |
Df_rev2
+----------+-------------+
| review_2 | Equivalence |
+----------+-------------+
|K2 | 7 |
|K1+ | 6 |
|K3 | 10 |
Df_rev3
+----------+-------------+
| review_3 | Equivalence |
+----------+-------------+
|L3 | 10 |
|L2 | 9 |
|L1 | 8 |
我必须在一个新的Dataframe中使用traduced的值,并添加一个具有第二个最佳值的列,例如:
Df_output
+--------+---------+---------+----------+-------------+
|Game | rev_1_t | rev_2_t | rev_3_t | second_best |
+--------+---------+---------+----------+-------------+
|CA | 9 | 7 | 8 | 8 |
|FT | 3 | 6 | 10 | 6 |
为了减少它,我尝试用左连接,但我太迷路了。我该怎么处理?
####### 第二部分####################如何将一个Dataframe中的列转换为另一个Dataframe中的列,将多个列?例如:
Df_revuews
+--------+-------+-------+--------+
|Game | rev_1 | rev_2 | rev_3 |
+- ------+-------+-------+--------+
|CA |XX+ | K2 | L1 |
|FT |Z- | K1+ | L3 |
Df_equiv
+--------+-------+
|Valorat | num |
+- ------+-------+
|X |3 |
|XX+ |5 |
|Z |7 |
|Z- |6 |
|K1+ |6 |
|K2 |4 |
|L1 |5 |
|L2 |6 |
|L3 |7 |
Output
+--------+-------+-------+--------+
|Game | rev_1 | rev_2 | rev_3 |
+- ------+-------+-------+--------+
|CA |5 | 4 | 5 |
|FT |6 | 6 | 7 |
正如你所看到的,我正在这样做:
val joined = df_reviews
.join(df_equiv, df_reviews("rev_1") === df_equiv("num") && df_reviews("rev_2") === df_equiv("num")
&& df_reviews("rev_3") === df_equiv("num"), "left")
.select(df_reviews("Game"),
df_equiv("num").as("rev_1_t"),
df_equiv("num").as("rev_2_t"),
df_equiv("num").as("rev_3_t")
)
提前谢谢!
1条答案
按热度按时间cotxawn71#
您可以执行一些左连接,并使用
sort_array
:关于第二个问题: