如何将单列转换为多行

ppcbkaq5  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(454)

需要将单行转换为多列。做了下面的事。

val list = List("a", "b", "c", "d")

  import spark.implicits._

  val df = list.toDF("id")

  df.show()

  import spark.implicits._

   val transpose = list.zipWithIndex.map {
    case (_, index) => col("data").getItem(index).as(s"col_${index}")
  }

  df.select(collect_list($"id").as("data")).select(transpose: _*).show()

输出:

+-----+-----+-----+-----+
|col_0|col_1|col_2|col_3|
+-----+-----+-----+-----+
|    a|    b|    c|    d|
+-----+-----+-----+-----+

做了些什么然后改变了它。但转置函数的问题是,它是中继原始数据(列表)。如果我们在df中做任何过滤,它总是显示4列,因为原始列表有4列。我怎样才能把这张单子缩短。

添加更多信息

df.filter($"id" =!="a" ).select(collect_list($"id").as("data")).select(transpose: _*).show()\

如果应用筛选条件并显示命令

+-----+-----+-----+-----+
|col_0|col_1|col_2|col_3|
+-----+-----+-----+-----+
|    b|    c|    d| null|
+-----+-----+-----+-----+

这是错误的,应该显示3列而不是4列。

gab6jxml

gab6jxml1#

根据df行计数修剪列。如果有用请告诉我

import org.apache.spark.sql.functions._

object TransposeV2 {

  def main(args: Array[String]): Unit = {
    val spark = Constant.getSparkSess

    val list = List("a", "b", "c", "d")

    import spark.implicits._

    val df = list.toDF("id")

    df.show()

    import spark.implicits._

    val transpose = list.zipWithIndex.map {
      case (_, index) => {
        col("data").getItem(index).as(s"col_${index}")
      }
    }

    df.select(collect_list($"id").as("data")).select(transpose: _*).show()

    val dfInterim = df.filter($"id" =!="a" )
    val finalElements : Int = dfInterim.count().toInt
    dfInterim.select(collect_list($"id").as("data")).select(transpose.take(finalElements): _*).show()
  }

}
qcuzuvrc

qcuzuvrc2#

您可以使用pivot:

val df = List("a", "b", "c", "d").toDF("id")

val dfFiltered = df.filter($"id"=!="a")

dfFiltered
  .groupBy().pivot($"id").agg(first($"id"))
  .toDF((0 until dfFiltered.count().toInt).map(i => s"col_$i"):_*)
  .show()

+-----+-----+-----+
|col_0|col_1|col_2|
+-----+-----+-----+
|    b|    c|    d|
+-----+-----+-----

相关问题