scala—在数据集和Dataframe上使用spark中的自定义类

dm7nw8vv 于 2021-07-14 发布在 Spark

关注(0)|答案(1)|浏览(544)

我把一些我想用作本机数据类型的自定义类放在一起，即sparksql。我看到UDT刚刚向公众开放，但很难搞清楚。有没有办法让我这么做？
例子

case class IPv4(ipAddress: String){
  // IPv4 converted to a number
  val addrL: Long = IPv4ToLong(ipAddress)
}
// Will read in a bunch of random IPs in the form {"ipAddress": "60.80.39.27"}
val IPv4DF: DataFrame = spark.read.json(path)
IPv4DF.createOrReplaceTempView("IPv4")
spark.sql(
    """SELECT *
     FROM IPv4
     WHERE ipAddress.addrL > 100000"""
    )

scala apache-spark apache-spark-sql serialization user-defined-types

来源：https://stackoverflow.com/questions/66710378/using-custom-classes-in-spark-on-datasets-and-dataframes

1条答案

按热度按时间

xkftehaa1#

你可以构造一个 Dataset 并使用case类进行筛选 addrL 属性：

case class IPv4(ipAddress: String){
  // IPv4 converted to a number
  val addrL: Long = IPv4ToLong(ipAddress)
}
val ds = Seq("60.80.39.27").toDF("ipAddress").as[IPv4]
ds.filter(_.addrL > 100000).show
+-----------+
|  ipAddress|
+-----------+
|60.80.39.27|
+-----------+

赞(0）回复(0）举报 2021-07-14

我来回答

scala—在数据集和Dataframe上使用spark中的自定义类

1条答案

相关问题

热门标签

最新问答