我试图比较两列,每列来自不同的df。我有这两个:
df1
+----+-------+-------+
|Game|rev_1_t|rev_2_t|
+----+-------+-------+
| CA| AA | AA |
| FT| B | C |
+----+-------+-------+
df_prev
+----+-------+-------+
|Game|rev_1_t|rev_2_t|
+----+-------+-------+
| CA| C | AA |
| FT| B | C |
+----+-------+-------+
我想比较df1中的rev_1_t和df_prev中的rev_1_t,如果有更改,则放入一个名为change的新列,如果没有更改,则使用“y”和“n”。同时,我想添加一个名为prev\u value的新列,在该列中存储df\u prev中rev\u 1\u t的先前值。版本2相同。输出为:
Output:
+----+-------+--------+------------+---------+----------+--------------+
|Game|rev_1_t| change | prev_value | rev_2_t | change_2 | prev_value_2 |
+----+-------+--------+------------+---------+----------+--------------+
| CA| C | Y | C | AA | Y | C |
| FT| B | Y | B | C | Y | B |
+----+-------+--------+------------+---------+----------+--------------+
我试着做你在这里看到的,但我有不同的错误:
val change = df1.withColumn(
"change", when (df1("rev_1_t") === df_prev("rev_1_t"), df1("rev_1_t")).otherwise(df_prev("rev_1_t"))
.withColumn(
"prev_value", when(df1("rev_1_t") === df_prev("rev_1_t"), "N").otherwise("Y"))
1条答案
按热度按时间cigdeys31#
可以执行联接,然后比较相关列: