hadoop在spark下运行时，将其stderr合并到stdout中

bhmjp9jg 于 2021-06-02 发布在 Hadoop

关注(0)|答案(0)|浏览(201)

我打字的时候

hadoop fs -text /foo/bar/baz.bz2 2>err 1>out

我得到两个非空文件： err 与

2015-05-26 15:33:49,786 INFO  [main] bzip2.Bzip2Factory (Bzip2Factory.java:isNativeBzip2Loaded(70)) - Successfully loaded & initialized native-bzip2 library system-native
2015-05-26 15:33:49,789 INFO  [main] compress.CodecPool (CodecPool.java:getDecompressor(179)) - Got brand-new decompressor [.bz2]

以及 out 文件的内容（如预期）。
当我从python（2.6）调用相同的命令时：

from subprocess import Popen
with open("out","w") as out:
    with open("err","w") as err:
        p = Popen(['hadoop','fs','-text',"/foo/bar/baz.bz2"],
                  stdin=None,stdout=out,stderr=err)
print p.wait()

我得到了完全相同的（正确的）行为。
但是，当我在pyspark下运行相同的代码时（或者使用 spark-submit )，我得到一个空的 err 文件和 out 文件以上面的日志消息开始（然后它包含实际数据）。
我做错什么了？
注意：python代码的目的是给出 hadoop fs -text 到另一个程序（即通过 stdout=PIPE 至 Popen )，所以请不要建议 hadoop fs -get . 谢谢。
当我跑的时候 hadoop 低于 time :

from subprocess import Popen
with open("out","w") as out:
    with open("err","w") as err:
        p = Popen(['/usr/bin/time','hadoop','fs','-text',"/foo/bar/baz.bz2"],
                  stdin=None,stdout=out,stderr=err)
print p.wait()

这个 time 输出正确转到 err ，但是 hadoop 日志错误地转到 out .
即。， hadoop 合并其 stderr 进入它的 stdout 当它在Spark下运行时。

hadoop apache-spark pyspark Process

来源：https://stackoverflow.com/questions/30467502/hadoop-when-run-under-spark-merges-its-stderr-into-stdout

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

hadoop在spark下运行时，将其stderr合并到stdout中

暂无答案！

相关问题

热门标签

最新问答