我使用spark 2.4.4运行一个简单的emr群集,我想使用graphframes v0.7运行以下代码:
from pyspark import *
from pyspark.sql import *
from graphframes import *
sc= SparkContext().getOrCreate()
sc.setLogLevel("ERROR")
spark = SparkSession.builder.appName('graphFrames').getOrCreate()
spark.sparkContext.addPyFile("/home/hadoop/jars/graphframes.zip")
vertices = spark.createDataFrame([('1', 'Carter', 'Derrick', 50),
('2', 'May', 'Derrick', 26),
('3', 'Mills', 'Jeff', 80),
('4', 'Hood', 'Robert', 65),
('5', 'Banks', 'Mike', 93),
('98', 'Berg', 'Tim', 28),
('99', 'Page', 'Allan', 16)],
['id', 'name', 'firstname', 'age'])
edges = spark.createDataFrame([('1', '2', 'friend'),
('2', '1', 'friend'),
('3', '1', 'friend'),
('1', '3', 'friend'),
('2', '3', 'follows'),
('3', '4', 'friend'),
('4', '3', 'friend'),
('5', '3', 'friend'),
('3', '5', 'friend'),
('4', '5', 'follows'),
('98', '99', 'friend'),
('99', '98', 'friend')],
['src', 'dst', 'type'])
g = GraphFrame(vertices, edges)
## Take a look at the DataFrames
g.vertices.show()
g.edges.show()
## Check the number of edges of each vertex
g.degrees.show()
它被发现并导入如下:
[root@ip-172-31-13-149 scripts]# $SPARK_HOME/bin/spark-submit --packages
graphframes:graphframes:0.7.0-spark2.4-s_2.11 tst.py
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-835b0432-a6e7-4b5c-afd6-44e7f6ab2c26;1.0
confs: [default]
found graphframes#graphframes;0.7.0-spark2.4-s_2.11 in spark-packages
found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 116ms :: artifacts dl 3ms
:: modules in use:
graphframes#graphframes;0.7.0-spark2.4-s_2.11 from spark-packages in [default]
org.slf4j#slf4j-api;1.7.16 from central in [default]
当我运行一个简单的graphframe示例时,遇到以下错误:
Traceback (most recent call last):
File "/home/hadoop/scripts/tst.py", line 32, in <module>
g = GraphFrame(vertices, edges)
File "/root/.ivy2/jars/graphframes_graphframes-0.7.0-spark2.4-s_2.11.jar/graphframes/graphframe.py", line 89, in __init__
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o100.createGraph.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.graphframes.GraphFrame$.apply(GraphFrame.scala:676)
at org.graphframes.GraphFramePythonAPI.createGraph(GraphFramePythonAPI.scala:10)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
在spark-default.sh中还添加了jar包:
spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
还尝试了hughcristensen建议的步骤,如下所示:https://github.com/graphframes/graphframes/issues/172
我真的很感激任何帮助,因为我不知道我还能做什么。
暂无答案!
目前还没有任何答案,快来回答吧!