如何从Dataframe中子集一个Dataframe

qxgroojn 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(452)

我想从Parquet文件生成的df中子集一个dataframe

+----+-----+----------+-----+-----------------+-----+-----------+-----+
|year|state|count1    |rowId|count2           |rowId|count3     |rowId|
+----+-----+----------+-----+-----------------+-----+-----------+-----+
|2014|   CT|    343477|    0|           343477|    0|     343477|    0|
|2014|   DE|    123431|    1|           123431|    1|     123431|    1|
|2014|   MD|    558686|    2|           558686|    2|     558686|    2|
|2014|   NJ|    773321|    3|           773321|    3|     773321|    3|
|2015|   CT|    343477|    4|           343477|    4|     343477|    4|
|2015|   DE|    123431|    5|           123431|    5|     123431|    5|
|2015|   MD|    558686|    6|           558686|    6|     558686|    6|

我希望保留一个“rowid”列并删除其他“rowid”列，并且我还希望使rowid列成为第一列：

+----+-----+----------+-----+-----------------+
    rowId||year|state|count1    |count2 |count3   |
    +----+-----+----------+-----+-----------------+-
        0|2014|   CT|    343477|  343477|   343477|
        1|2015|   DE|    123431|  123431|   123431|
        2|2015|   MD|    558686|  558686|   558686|
        3|2015|   NJ|    773321|  773321|   773321|
        4|2015|   CT|    343477|  343477|   343477| 
        5|2015|   DE|    123431|  123431|   123431|
        6|2015|   MD|    558686|  558686|   558686|

我的尝试：

df.createOrReplaceTempView("test")
 val sqlDF = spark.sql("SELECT rowId, year, state, count1, count2, count3 from test)

我得到错误：org.apache.spark.sql.analysisexception:引用“rowid”不明确，可能是：rowid#3356l，rowid#3368l，rowid#3378l，rowid#3388l，rowid#3398l，rowid#3408l。我该怎么做？谢谢您。。。

hadoop scala apache-spark bigdata

来源：https://stackoverflow.com/questions/51144140/how-to-subset-a-dataframe-from-a-dataframe

1条答案

按热度按时间

h22fl7wq1#

可以根据索引Map列，如下所示

df.map(attributes => 
               (attributes.getInt(3),  
                attributes.getInt(0),
                attributes.getString(1),
                attributes.getInt(2),
                attributes.getInt(4), 
               attributes.getInt(6))).
toDF("rowId", "year", "state", "count1", "count2", "count3").show()

可以根据您的列数据类型随意修改上述语句。

赞(0）回复(0）举报 2021-05-29

我来回答

如何从Dataframe中子集一个Dataframe

1条答案

相关问题

热门标签

最新问答