pyspark中的vertica sql未运行

l2osamch  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(330)

我是新来使用pyspark运行vertica的sql的。我已经导入了vertica\u python模块,我已经将sql转换为vertica表,我需要将查询结果转换为pyspark dataframe。我使用spark submit命令运行,但出现了错误,错误为“memoryerror”。有人能帮忙吗?
我试图在spark submit命令中增加和减少驱动器和执行器内存,但没有工作,得到了相同的错误消息。

  1. from pyspark.sql import SQLContext
  2. from pyspark.sql import HiveContext
  3. sc=SparkContext(appName="build_history_pyspark")
  4. sqlContext=SQLContext(sc)
  5. start_load_time_UTC='2019-06-01 00:00:00'
  6. start_load_time_UTC=datetime.strptime(start_load_time_UTC,'%Y-%m-%d %H:%M:%S')
  7. url="jdbc:vertica://144.60.100.141:5433/SRVVERTICA" . #This is the vertica database host IP and port
  8. properties={"user":"hhhhh","password":"yyyyy","driver": "com.vertica.jdbc.Driver"}
  9. sql_for_kafka="SELECT STARTTIME, LONGITUDE,LATITUDE FROM my_table WHERE (VERTICA_LOAD_TIME >'%s')" %(start_load_time_UTC)
  10. sdf_orig=sqlContext.read.format("JDBC").options(url=url, query=sql_for_kafka,**properties).load()
  11. sdf_orig.show(10)

错误说明:memoryerror
我的命令:
spark submit--driver memory 10g--num executors 10--executor cores 4--executor memory 10g build\u history\u pyspark.py
我做了进一步的调试,我认为错误是由于变量start\u load\u time\u utc引起的。我真的需要在这里使用一个变量,而不是直接值。但不知道正确的语法。
我还将查询语句更改为:sql\u for \u kafka=“从我的\u表中选择starttime,longitude,latitude where vertica\u load\u time>$start\u load\u time\u utc”
我有不同的错误:

  1. Py4JJavaError: An error occurred while calling o57.load.
  2. : java.sql.SQLDataException: [Vertica][VJDBC](3679) ERROR: Invalid
  3. input syntax for timestamp: "$start_load_time_UTC"
  4. at com.vertica.util.ServerErrorData.buildException(Unknown Source)
  5. at com.vertica.io.ProtocolStream.readExpectedMessage(Unknown Source)
  6. at com.vertica.dataengine.VDataEngine.prepareImpl(Unknown Source)
  7. at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
  8. at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
  9. at com.vertica.jdbc.common.SPreparedStatement.<init>(Unknown Source)
  10. at com.vertica.jdbc.jdbc4.S4PreparedStatement.<init>(Unknown Source)
  11. at com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl.<init>(Unknown Source)
  12. at com.vertica.jdbc.VJDBCObjectFactory.createPreparedStatement(Unknown Source)
  13. at com.vertica.jdbc.common.SConnection.prepareStatement(Unknown Source)
  14. at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
  15. at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:115)
  16. at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
  17. at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
  18. at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
  19. at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
  20. at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
  21. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  22. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  23. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  24. at java.lang.reflect.Method.invoke(Method.java:498)
  25. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
  26. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  27. at py4j.Gateway.invoke(Gateway.java:282)
  28. at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
  29. at py4j.commands.CallCommand.execute(CallCommand.java:79)
  30. at py4j.GatewayConnection.run(GatewayConnection.java:214)
  31. at java.lang.Thread.run(Thread.java:745)

原因:com.vertica.support.exceptions.dataexception:[vertica]vjdbc错误:时间戳的输入语法无效:“$start\u load\u time\u utc”。。。28个以上
我最终解决了这个问题,将query语句改为:

  1. sql_for_kafka="(SELECT STARTTIME, LONGITUDE,LATITUDE FROM my_table WHERE (VERTICA_LOAD_TIME >'{0}') )temp" .format(start_load_time_UTC)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题