运行我的第一个spark python程序时出错

mctunoxg 于 2021-05-29 发布在 Hadoop

关注(0)|答案(4)|浏览(357)

我一直在用python在eclipse上使用spark（基于hadoop2.7）进行工作，我正在尝试运行示例“word count”，这是我的代码：#imports#注意未使用的导入（以及未使用的变量），#请对它们进行注解，否则，执行时会出现任何错误。#请注意，指令“@pydevcodeanysisignore”和“@unusedimport”都无法解决该问题#从pyspark.mllib.clustering导入kmeans从pyspark导入sparkconf，sparkcontext导入os


# Configure the Spark environment
sparkConf = SparkConf().setAppName("WordCounts").setMaster("local")
sc = SparkContext(conf = sparkConf)
# The WordCounts Spark program
textFile = sc.textFile(os.environ["SPARK_HOME"] + "/README.md")
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word:     (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect(): print wc

然后我得到了以下错误：

17/08/07 12:28:13 WARN NativeCodeLoader: Unable to load native-hadoop     library for your platform... using builtin-java classes where applicable
17/08/07 12:28:16 WARN Utils: Service 'SparkUI' could not bind on port     4040. Attempting port 4041.
Traceback (most recent call last):
File "/home/hduser/eclipse-workspace/PythonSpark/src/WordCounts.py", line  12, in <module>
sc = SparkContext(conf = sparkConf)
File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "/usr/local/spark/python/pyspark/context.py", line 186, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/usr/local/spark/python/pyspark/accumulators.py", line 259, in  _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/usr/lib/python2.7/SocketServer.py", line 417, in __init__
self.server_bind()
File "/usr/lib/python2.7/SocketServer.py", line 431, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -3] Temporary failure in name resolution

有什么帮助吗？？我可以使用sparkshell在scala上运行spark的任何项目，也可以在eclipse上运行任何（非spark）python程序，没有任何错误。我想我的问题是pyspark有什么要做的吗？？

hadoop python apache-spark pyspark

来源：https://stackoverflow.com/questions/45544920/run-my-first-spark-python-program-error

4条答案

按热度按时间

6tdlim6h1#

这就足够运行你的程序了。因为，你的壳。
先试试你的sheel模式。。。
一行一行。。。

textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word:     (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc

赞(0）回复(0）举报 2021-05-29

0ejtzxu12#

这样试试。。。
启动spark后，它在命令提示符sc上显示为sparkcontext。
如果不可用，您可以使用以下方法。。

>>sc=new org.apache.spark.SparkContext()
>>NOW YOU CAN USE...sc

赞(0）回复(0）举报 2021-05-29

ecfsfe2w3#

根据我的理解，以下代码应该工作，如果Spark是正确安装。

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("WordCount")
sc = SparkContext(conf = conf)
input = sc.textFile("file:///sparkcourse/PATH_NAME")
words = input.flatMap(lambda x: x.split())
wordCounts = words.countByValue()
for word, count in wordCounts.items():
    cleanWord = word.encode('ascii', 'ignore')
    if (cleanWord):
        print(cleanWord.decode() + " " + str(count))

赞(0）回复(0）举报 2021-05-29

qqrboqgw4#

你可以试试这个，只要创建sparkcontext就足够了，它可以工作。

sc = SparkContext()
# The WordCounts Spark program
textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word:     (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc

赞(0）回复(0）举报 2021-05-29

我来回答

运行我的第一个spark python程序时出错

4条答案

相关问题

热门标签

最新问答