我使用的是亚马逊emr上的ipython笔记本,带有toree内核。我想从我的Hive表中读取一些数据。
query_cmd = ("select uuid, collect_list(distinct newsid) as news from \
sog_l1screen.v1_test_dw_l1_display_orc_dt where dt>='20170709' and dt<'20170810'\
and system_setting_area='BR' and action=2 group by uuid")
original_df = spark.sql(query_cmd)
它告诉我
Name: org.apache.toree.interpreter.broker.BrokerException
Message: Traceback (most recent call last):
File "/tmp/kernel-PySpark-d8cc3c23-661f-478e-9cd8-35f54481581c/pyspark_runner.py", line 189, in <module>
eval(compiled_code)
File "<string>", line 3, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 541, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u"Table or view not found: `sog_l1screen`.`v1_test_dw_l1_display_orc_dt`; line 1 pos 57;\n'Aggregate ['uuid], ['uuid, 'collect_list('newsid) AS news#2]\n+- 'Filter ((('dt >= 20170709) && ('dt < 20170810)) && (('system_setting_area = BR) && ('action = 2)))\n +- 'UnresolvedRelation `sog_l1screen`.`v1_test_dw_l1_display_orc_dt`\n"
StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
scala.Option.foreach(Option.scala:257)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:162)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:280)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:748)
``` `sog_l1screen` 是我的数据库和 `v1_test_dw_l1_display_orc_dt` 这是table。我确信它们存在于我的Hive中,我可以使用Hive触摸它们,或者将上面的代码写入一个 `.py` 文件,然后 `spark-submit` 这个文件。那么,如何使用ipython笔记本从配置单元表读取数据呢?
暂无答案!
目前还没有任何答案,快来回答吧!