pig:python udf流错误

afdcj2ne  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(245)

从pig调用python udf时。我得到以下错误(尝试日志)&在控制台中得到下面提到的错误。无法跟踪我做错了什么,因为以前的代码也在运行。在64位机器上使用rhel(6.4)和2.7.2 hadoop&0.15版本pig&python3.5

Traceback (most recent call last):
File "/tmp/controller2772959444531928936.py", line 356, in <module>
sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8])
File "/tmp/controller2772959444531928936.py", line 88, in main
input_str = self.get_next_input()
File "/tmp/controller2772959444531928936.py", line 164, in get_next_input
while input_str.endswith(END_RECORD_DELIM) == False:
TypeError: endswith first arg must be bytes or a tuple of bytes, not str

控制台出现以下错误: java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:503) 还有以下错误: Exception in thread "Thread-35" java.lang.NullPointerException at org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:468) 下面是python代码:

@outputSchema('output_field_name:chararray')
def readfileinlist(filename):
    with open(filename) as inputfile:
            lines = inputfile.read().splitlines()
    return lines

@outputSchema('output_field_name:boolean')
 def intlgtinlist(srcgt,destgt,intgtllist):
    if srcgt.startswith(tuple(intgtllist)) or destgt.startswith(tuple(intgtllist)):
            return True
    else:
            return False

@outputSchema('output_field_name:boolean')
def checkintlgtincdrs(aparty,srcgt,destgt):
    intgtllist = []
    try:
            if( (len(srcgt) > 0 or len(destgt) > 0) and (srcgt or destgt) and aparty.isdigit()):
                    if os.path.isfile(INTERNATIONALGTPATH) and os.access(INTERNATIONALGTPATH, os.R_OK) and os.stat(INTERNATIONALGTPATH).st_size > 0:

                            #FUNCTION FOR READING THE FILE IN ARRAY/TUPLE
                            intgtllist = readfileinlist(INTERNATIONALGTPATH)

                            #CHECK FOR THE INPUT(ARG0) IN ARRAY/TUPLE
                            if intlgtinlist(srcgt,destgt,intgtllist):
                                    return True
                            else:
                                    return False
                    else:
                            return False
            else:
                    return False
    except OSError or IndexError:
            pass

    return True

下面是Pig的剧本

record = LOAD '/inreport/cdrs/ZTE_20160301*' USING PigStorage('|','-tagFile');
 REGISTER 'udf_smsiuc.py' using streaming_python as smsiucudfs;
 internationalcdrsfilter = FILTER record by smsiucudfs.checkintlgtincdrs($1,$26,$27);

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题