从pig调用python udf时。我得到以下错误(尝试日志)&在控制台中得到下面提到的错误。无法跟踪我做错了什么,因为以前的代码也在运行。在64位机器上使用rhel(6.4)和2.7.2 hadoop&0.15版本pig&python3.5
Traceback (most recent call last):
File "/tmp/controller2772959444531928936.py", line 356, in <module>
sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8])
File "/tmp/controller2772959444531928936.py", line 88, in main
input_str = self.get_next_input()
File "/tmp/controller2772959444531928936.py", line 164, in get_next_input
while input_str.endswith(END_RECORD_DELIM) == False:
TypeError: endswith first arg must be bytes or a tuple of bytes, not str
控制台出现以下错误: java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:503)
还有以下错误: Exception in thread "Thread-35" java.lang.NullPointerException at org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:468)
下面是python代码:
@outputSchema('output_field_name:chararray')
def readfileinlist(filename):
with open(filename) as inputfile:
lines = inputfile.read().splitlines()
return lines
@outputSchema('output_field_name:boolean')
def intlgtinlist(srcgt,destgt,intgtllist):
if srcgt.startswith(tuple(intgtllist)) or destgt.startswith(tuple(intgtllist)):
return True
else:
return False
@outputSchema('output_field_name:boolean')
def checkintlgtincdrs(aparty,srcgt,destgt):
intgtllist = []
try:
if( (len(srcgt) > 0 or len(destgt) > 0) and (srcgt or destgt) and aparty.isdigit()):
if os.path.isfile(INTERNATIONALGTPATH) and os.access(INTERNATIONALGTPATH, os.R_OK) and os.stat(INTERNATIONALGTPATH).st_size > 0:
#FUNCTION FOR READING THE FILE IN ARRAY/TUPLE
intgtllist = readfileinlist(INTERNATIONALGTPATH)
#CHECK FOR THE INPUT(ARG0) IN ARRAY/TUPLE
if intlgtinlist(srcgt,destgt,intgtllist):
return True
else:
return False
else:
return False
else:
return False
except OSError or IndexError:
pass
return True
下面是Pig的剧本
record = LOAD '/inreport/cdrs/ZTE_20160301*' USING PigStorage('|','-tagFile');
REGISTER 'udf_smsiuc.py' using streaming_python as smsiucudfs;
internationalcdrsfilter = FILTER record by smsiucudfs.checkintlgtincdrs($1,$26,$27);
暂无答案!
目前还没有任何答案,快来回答吧!