在将大量行(33000行)从一个表流式传输到python时,我在hive中面临管道中断问题。同样的脚本在7656行之前工作正常。
0: jdbc:hive2://xxx.xx.xxx.xxxx:10000> insert overwrite table test.transform_inner_temp select * from test.transform_inner_temp_view2 limit 7656;
INFO : Table test.transform_inner_temp stats: [numFiles=1, numRows=7656, totalSize=4447368, rawDataSize=4439712]
No rows affected (19.867 seconds)
0: jdbc:hive2://xxx.xx.xxx.xxxx:10000> insert overwrite table test.transform_inner_temp select * from test.transform_inner_temp_view2;
INFO : Status: Running (Executing on YARN cluster with App id application_xxxxxxxxxxxxx_xxxxx)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error occurred while reading or writing to your custom script. It may have crashed with an error.
at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:456)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
Caused by: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)
at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:425)
... 24 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error occurred while reading or writing to your custom script. It may have crashed with an error.
at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:456)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
Caused by: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)
at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:425)
... 24 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_xxxxxxxxxxx_xxxxx_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
我可以看到同样的问题,只有与连接条件单独在下面的链接。
https://community.hortonworks.com/questions/11025/hive-query-issue.html
我通过设置hive.vectorized.execution.enabled=false尝试了上面链接中建议的解决方案;但我的问题是一样的。
我还可以在下面的链接和从single table test.transform\u inner\u temp1中选择的列中看到相同的错误,其中连接结果已经写入。
配置单元断开管道错误
如果脚本是错误的,那么它不应该在较小的行中正常工作。希望脚本是好的,问题是与配置单元设置或内存设置的问题。我在互联网上下了很大的功夫,却找不出类似的问题。请提供您的建议/解决方案。
配置单元代码:
CREATE VIEW IF NOT EXISTS test.transform_inner_temp_view2
as
select * from
(select transform (*)
USING "scl enable python27 'python TestPython.py'"
as (Col_1 STRING,
col_2 STRING,
...
..
col_125 STRING
)
FROM
test.transform_inner_temp1 a) b;
用三种不同的方式尝试了python脚本,如下所示。但这个问题并没有得到解决。
脚本1:
# !/usr/bin/env python
'''
Created on June 2, 2017
@author: test
'''
import sys
from datetime import datetime
import decimal
import string
D = decimal.Decimal
while True:
line = sys.stdin.readline()
if not line:
break
line = string.strip(line, "\n ")
outList = []
TempList = line.strip().split('\t')
col_1 = TempList[0]
...
....
col_125 = TempList[34] + TempList[32]
outList.extend((col_1,....col_125))
outValue = "\t".join(map(str,outList))
print "%s"%(outValue)
脚本2:
# !/usr/bin/env python
'''
Created on June 2, 2017
@author: test
'''
import sys
from datetime import datetime
import decimal
import string
D = decimal.Decimal
try:
for line in sys.stdin:
line = sys.stdin.readline()
TempList = line.strip().split('\t')
col_1 = TempList[0]
...
....
col_125 = TempList[34] + TempList[32]
outList.extend((col_1,....col_125))
outValue = "\t".join(map(str,outList))
print "%s"%(outValue)
except:
print sys.exc_info()
剧本3:
# !/usr/bin/env python
'''
Created on June 2, 2017
@author: test
'''
import sys
from datetime import datetime
import decimal
import string
D = decimal.Decimal
for line in sys.stdin:
line = sys.stdin.readline()
TempList = line.strip().split('\t')
col_1 = TempList[0]
...
....
col_125 = TempList[34] + TempList[32]
outList.extend((col_1,....col_125))
outValue = "\t".join(map(str,outList))
print "%s"%(outValue)
我正在使用带有beeline的apachehive(版本1.2.1000.2.5.3.0-37)。提前谢谢
暂无答案!
目前还没有任何答案,快来回答吧!