在scala中连接结构中的项时遇到问题

5uzkadbs 于 2022-11-23 发布在 Scala

关注(0)|答案(1)|浏览(195)

我有两个数据集要连接
df

|-- key: struct (nullable = false)
 |    |-- name: string (nullable = true)
 |    |-- subId: struct (nullable = true)
 |    |    |-- x: integer (nullable = false)
 |    |    |-- y: integer (nullable = false)
 |    |    |-- level: integer (nullable = false)
 |-- otherItems: struct (nullable = false)
 |    |-- nameRestaurant: string (nullable = true)
 |    |-- thing: struct (nullable = true)

和另一个df2

|-- key: struct (nullable = false)
 |    |-- name: string (nullable = true)
 |    |-- subId: struct (nullable = true)
 |    |    |-- x: integer (nullable = false)
 |    |    |-- y: integer (nullable = false)
 |    |    |-- level: integer (nullable = false)
 |-- attribute: struct (nullable = false)
 |    |-- address: string (nullable = true)
 |    |-- someThing: struct (nullable = true)

我需要在key列上连接两个数据集，即
val df3 = df.join(df2, Seq("key"), "left")
然而，执行该连接会导致没有匹配，而我确信它们存在
当我尝试通过执行以下操作扩展联接时

val df3 = df.join(df2, Seq("key.name", "key.subId.x", "key.subId.y", "key.subId.level"), "left")

我收到错误

org.apache.spark.sql.AnalysisException: USING column `key.name` cannot be resolved on the left side of the join.

在结构体下面的项上加入是不可能的吗？有人能建议最好的方法吗？

scala

来源：https://stackoverflow.com/questions/74536417/having-issues-joining-on-items-within-a-struct-in-scala

1条答案

按热度按时间

z0qdvdin1#

在spark3.3.1中，这是可以的！但是在spark2.X版本中，您可以使用以下解决方案：
1.在key转换为string的每个df上创建一个新列，并在该字段上使用连接操作。在连接操作之后，您可以删除该字段：

df.withColumn("castOfKey" , col("key").cast("string") )\
  .join(
        df2.withColumn("castOfKey" , col("key").cast("string") ),
        Seq("castOfKey"),
        "left"
).drop("castOfKey")

赞(0）回复(0）举报 2022-11-23

我来回答

在scala中连接结构中的项时遇到问题

1条答案

相关问题

热门标签

最新问答