如何在python上逐行从sys.stdin获取前n行

kpbwa7wx 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(771)

我在为mapreduce编写一个reducer时遇到了一个问题。我想得到前10行非常大的文件，我用循环和中断。但是，break命令在hadoop上引发了一个错误，所以我正在寻找另一种方法：

for line in fileinput.input():
    if(counter>limit):
        break

    line = line.strip()
    print (line)
    counter +=1

错误日志：

Error: java.io.IOException: subprocess exited successfully
R/W/S=6936/19/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=s2132211
HADOOP_USER=null
last tool output: |29670    YOU HAVE AATO|
Broken pipe
    at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:129)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

hadoop mapreduce python

来源：https://stackoverflow.com/questions/40168214/how-to-get-first-n-lines-from-sys-stdin-line-by-line-on-python

1条答案

按热度按时间

kx7yvsdv1#

首先，要么你的例子格式不好，要么你有一个逻辑错误。 print(line) 以及 counter += 1 应该在for loop里面。
更简单的方法是：

for counter, line in enumerate(fileinput.input()):
    if(counter>limit):
        break

    line = line.strip()
    print (line)

现在，如果这不能解决这个问题，就有几个问题了。
1）你能看到程序的输出吗（它实际上是在打印for循环中的东西吗）？
2）程序是立即崩溃，还是在一段时间后崩溃？

赞(0）回复(0）举报 2021-06-02

我来回答

如何在python上逐行从sys.stdin获取前n行

1条答案

相关问题

热门标签

最新问答