获取变量中的dataframe列，如何？

db2dz4w8 于 2021-06-28 发布在 Hive

关注(0)|答案(1)|浏览(338)

环境：spark 1.6，scala
我试图从dataframe中获取一个datetime字段，以便在sparksql中进行比较。

val las_max_date_from_hive= hivecontext.sql("select min(SampleTime) max_SampleTime from mytable")

DF2 = hivecontext.sql ("select * from table2 where sampleDate >" + las_max_date_from_hive) // error here as  las_max_date_from_hive is a DF

如何从dataframe中获取datetime并在sql中使用？
谢谢
侯赛因

Hive scala DataFrame apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/41386439/getting-dataframe-column-in-variable-how-to

1条答案

按热度按时间

rmbxnbpk1#

很简单- sql 返回dataframe，但您确定它只有一个元素，因此可以执行以下操作：

val last_max_date_from_hive = hivecontext.sql("select min(SampleTime) max_SampleTime from mytable")

val firstRow = last_max_date_from_hive.map {
    // only value is important
    case Row (value) => value.asInstanceOf[java.sql.Timestamp]; // cast to Date
}.first()

// we use SimpleDateFormat to parse to proper string format
val df2 = sqlContext.sql ("select * from mytable where SampleTime > cast('" 
    + new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS").format(firstRow) 
    + "' as date)");

如果不想解析timestamp对象，那么可以使用 from_unixtime 功能和 getTime() :

val firstRow = las_max_date_from_hive.map {
    case Row (value) => value.asInstanceOf[java.sql.Timestamp].getTime() / 1000
}.first();

val df2 = sqlContext.sql ("select * from mytable where cast(SampleTime as timestamp) > from_unixtime(" + firstRow + ")")

赞(0）回复(0）举报 2021-06-28

我来回答

获取变量中的dataframe列，如何？

1条答案

相关问题

热门标签

最新问答