运行我的第一个spark python程序时出错

mctunoxg  于 2021-05-29  发布在  Hadoop
关注(0)|答案(4)|浏览(357)

我一直在用python在eclipse上使用spark(基于hadoop2.7)进行工作,我正在尝试运行示例“word count”,这是我的代码:#imports#注意未使用的导入(以及未使用的变量),#请对它们进行注解,否则,执行时会出现任何错误。#请注意,指令“@pydevcodeanysisignore”和“@unusedimport”都无法解决该问题#从pyspark.mllib.clustering导入kmeans从pyspark导入sparkconf,sparkcontext导入os

  1. # Configure the Spark environment
  2. sparkConf = SparkConf().setAppName("WordCounts").setMaster("local")
  3. sc = SparkContext(conf = sparkConf)
  4. # The WordCounts Spark program
  5. textFile = sc.textFile(os.environ["SPARK_HOME"] + "/README.md")
  6. wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
  7. for wc in wordCounts.collect(): print wc

然后我得到了以下错误:

  1. 17/08/07 12:28:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  2. 17/08/07 12:28:16 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
  3. Traceback (most recent call last):
  4. File "/home/hduser/eclipse-workspace/PythonSpark/src/WordCounts.py", line 12, in <module>
  5. sc = SparkContext(conf = sparkConf)
  6. File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
  7. conf, jsc, profiler_cls)
  8. File "/usr/local/spark/python/pyspark/context.py", line 186, in _do_init
  9. self._accumulatorServer = accumulators._start_update_server()
  10. File "/usr/local/spark/python/pyspark/accumulators.py", line 259, in _start_update_server
  11. server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
  12. File "/usr/lib/python2.7/SocketServer.py", line 417, in __init__
  13. self.server_bind()
  14. File "/usr/lib/python2.7/SocketServer.py", line 431, in server_bind
  15. self.socket.bind(self.server_address)
  16. File "/usr/lib/python2.7/socket.py", line 228, in meth
  17. return getattr(self._sock,name)(*args)
  18. socket.gaierror: [Errno -3] Temporary failure in name resolution

有什么帮助吗??我可以使用sparkshell在scala上运行spark的任何项目,也可以在eclipse上运行任何(非spark)python程序,没有任何错误。我想我的问题是pyspark有什么要做的吗??

6tdlim6h

6tdlim6h1#

这就足够运行你的程序了。因为,你的壳。
先试试你的sheel模式。。。
一行一行。。。

  1. textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
  2. wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
  3. for wc in wordCounts.collect():
  4. print wc
0ejtzxu1

0ejtzxu12#

这样试试。。。
启动spark后,它在命令提示符sc上显示为sparkcontext。
如果不可用,您可以使用以下方法。。

  1. >>sc=new org.apache.spark.SparkContext()
  2. >>NOW YOU CAN USE...sc
ecfsfe2w

ecfsfe2w3#

根据我的理解,以下代码应该工作,如果Spark是正确安装。

  1. from pyspark import SparkConf, SparkContext
  2. conf = SparkConf().setMaster("local").setAppName("WordCount")
  3. sc = SparkContext(conf = conf)
  4. input = sc.textFile("file:///sparkcourse/PATH_NAME")
  5. words = input.flatMap(lambda x: x.split())
  6. wordCounts = words.countByValue()
  7. for word, count in wordCounts.items():
  8. cleanWord = word.encode('ascii', 'ignore')
  9. if (cleanWord):
  10. print(cleanWord.decode() + " " + str(count))
qqrboqgw

qqrboqgw4#

你可以试试这个,只要创建sparkcontext就足够了,它可以工作。

  1. sc = SparkContext()
  2. # The WordCounts Spark program
  3. textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
  4. wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
  5. for wc in wordCounts.collect():
  6. print wc

相关问题