如何强制avro writer在sparkscalaDataframe中以utc格式写入时间戳

zu0ti5jz 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(557)

这个问题在这里已经有答案了：

apache spark-如何将时区设置为utc？目前默认为祖鲁（4个答案）
7个月前关门了。
我需要将时间戳字段写入avro，并确保数据以utc格式保存。当前，avro将其转换为服务器本地时区中的long（timestamp millis），这会导致问题，就好像读取bk的服务器是另一个时区一样。我查看了dataframewriter，它似乎提到了一个名为timezone的选项，但似乎没有帮助。有没有办法强迫avro考虑在特定时区接收的所有时间戳字段？


**CODE SNIPPET**

--write to spark avro

val data = Seq(Row("1",java.sql.Timestamp.valueOf("2020-05-11 15:17:57.188")))
val schemaOrig = List( StructField("rowkey",StringType,true)
,StructField("txn_ts",TimestampType,true))
val sourceDf =  spark.createDataFrame(spark.sparkContext.parallelize(data),StructType(schemaOrig))
sourceDf.write.option("timeZone","UTC").avro("/test4")

--now try to read back from avro
spark.read.avro("/test4").show(false)
avroDf.show(false)

original value in soure 2020-05-11 15:17:57.188
in avro  1589224677188
read bk from avro wt out format 
+-------------+-------------+
|rowkey       |txn_ts       |
+-------------+-------------+
|1            |1589224677188|
+-------------+-------------+

This is mapping fine but issue is if the local time of the server writing is EST and the one reading back is GMT it would give problem . 

println(new java.sql.Timestamp(1589224677188L))
2020-05-11 7:17:57.188   -- time in GMT

avro apache-spark apache-spark-sql spark-avro

来源：https://stackoverflow.com/questions/61977018/how-to-force-avro-writer-to-write-timestamp-in-utc-in-spark-scala-dataframe

1条答案

按热度按时间

relj7zay1#

.option("timeZone","UTC") 选项不会将时间戳转换为utc时区。
设置此 spark.conf.set("spark.sql.session.timeZone", "UTC") config属性将utc设置为所有时间戳的默认时区。
被诽谤的价值 spark.sql.session.timeZone 属性是jvm系统本地时区（如果未设置）。
如果上面的选项由于spark版本较低而无法工作，请尝试使用下面的选项。 --conf "spark.driver.extraJavaOptions=-Duser.timezone=UTC" --conf "spark.executor.extraJavaOptions=-Duser.timezone=UTC"

赞(0）回复(0）举报 2021-05-27

我来回答

如何强制avro writer在sparkscalaDataframe中以utc格式写入时间戳

1条答案

相关问题

热门标签

最新问答