当:spark/scala dataframe时,方法没有足够的参数

d8tt03nd  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(455)

我在spark df1和df2中有两个Dataframe,我基于一个公共列(即id)连接这两个Dataframe,然后添加一个额外的列结果并检查多个列,如果有任何列数据匹配,则需要在新列中插入匹配,如果没有匹配的条件,则需要在该列中传递为“不匹配”。我正在写下面的代码。

df1.join(df1,df2("id") === df2("id"))
   .withColumn("Result",
   when(
   df1("adhar_no") === df2("adhar_no")" || 
   df1("pan_no") === df2("pan_no") || 
   df1("Voter_id") === df2("Voter_id") || 
   df1("DL_no") === df2("DL_no"),"Matched"
  ).otherwise("Not Matched"))

  But getting error

  <console>:60: error: not enough arguments for method when: (condition: org.apache.spark.sql.Column, value: Any)org.apache.spark.sql.Column. Unspecified value parameter value.

  I have also tried below code

    df1.join(df2,df1("id") === df2("id"))
   .withColumn("Result",when(df1("adhar_no") === df2("adhar_no") || 
   when(df1("pan_no") === df2("pan_no") || 
   when(df1("Voter_id") === df2("Voter_id") ||  
   when(df1("DL_no") === df2("DL_no"),"Matched"))))
  .otherwise("Not Matched"))

在这两种情况下,我都会出错,有人能帮我怎么做吗。

oxcyiej7

oxcyiej71#

第一种情况是因为你有一个额外的 " 第4行(第一个条件)
这样就可以了:

df1.join(df2,df2("id") === df2("id"))
   .withColumn("Result",
   when(
   df1("adhar_no") === df2("adhar_no") || 
   df1("pan_no") === df2("pan_no") || 
   df1("Voter_id") === df2("Voter_id") || 
   df1("DL_no") === df2("DL_no"),"Matched"
  ).otherwise("Not Matched"))

第二个是beaseach,因为每个when都必须有一个输出值:这个例子对我来说毫无意义。第一个很好,但是你需要删除yout extra“(我假设是一个类型)
另外,作为个人偏好或建议,我更愿意引用美元语法的专栏。这对我来说更清楚,帮助我避免这样的拼写错误
用示例编辑
一些糟糕的测试Dataframe

val df1 = List((1, 10, 100, 1000, 10000), (2, 20, 200, 2000, 20000), (3, 30, 300, 3000, 30000)).toDF("id","adhar_no", "pan_no", "Voter_id", "DL_no")
    val df2 = List((1, 10, 100, 1000, 10000), (2, 20, 200, 2000, 20000), (4, 40, 400, 4000, 40000)).toDF("id","adhar_no", "pan_no", "Voter_id", "DL_no")

然后,修复了代码的模糊性:

df1.as("df1").join(df2.as("df2"), df1("id") === df2("id"))
      .withColumn("Result",  when(
          $"df1.adhar_no" === $"df2.adhar_no" ||
            $"df1.pan_no" === $"df2.pan_no" ||
            $"df1.Voter_id" === $"df2.Voter_id" ||
            $"df1.DL_no" === $"df2.DL_no"
          , "Matched"
        ).otherwise("Not Matched")
      )
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+
| id|adhar_no|pan_no|Voter_id|DL_no| id|adhar_no|pan_no|Voter_id|DL_no| Result|
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+
|  1|      10|   100|    1000|10000|  1|      10|   100|    1000|10000|Matched|
|  2|      20|   200|    2000|20000|  2|      20|   200|    2000|20000|Matched|
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+

相关问题