无法打开别名的迭代器< alias\u name>

j9per5c4 于 2021-06-24 发布在 Pig

关注(0)|答案(1)|浏览(334)

我知道这是重复最多的问题之一。我几乎找遍了所有地方，没有任何资源可以解决我面临的问题。下面是我的问题陈述的简化版本。但实际数据有点复杂，所以我不得不使用自定义项
我的输入文件：（input.txt）

NotNeeded1,NotNeeded11;Needed1
NotNeeded2,NotNeeded22;Needed2

我希望输出是

Needed1
Needed2

因此，我正在编写以下udf（java代码）：

package com.company.pig;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class myudf extends EvalFunc<String>{
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        String s = (String)input.get(0);
        String str = s.split("\\,")[1];
        String str1 = str.split("\\;")[1];
        return str1;
    }
}

把它 Package 成

rollupreg_extract-jar-with-dependencies.jar

下面是我的Pig壳代码

grunt> REGISTER /pig/rollupreg_extract-jar-with-dependencies.jar;
grunt> DEFINE myudf com.company.pig.myudf;
grunt> data = LOAD 'hdfs://sandbox.hortonworks.com:8020/pig_hdfs/input.txt' USING PigStorage(',');
grunt> extract = FOREACH data GENERATE myudf($1);
grunt> DUMP extract;

我得到以下错误：

2017-05-15 15:58:15,493 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2017-05-15 15:58:15,577 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-05-15 15:58:15,659 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-05-15 15:58:15,774 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2017-05-15 15:58:15,865 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-05-15 15:58:15,923 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-05-15 15:58:15,923 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-05-15 15:58:16,184 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2017-05-15 15:58:16,196 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2017-05-15 15:58:16,396 [main] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2017-05-15 15:58:16,576 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2017-05-15 15:58:16,580 [main] WARN  org.apache.pig.tools.pigstats.ScriptState - unable to read pigs manifest file
2017-05-15 15:58:16,584 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-05-15 15:58:16,588 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2017-05-15 15:58:17,258 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/pig/rollupreg_extract-jar-with-dependencies.jar to DistributedCache through /tmp/temp-1119775568/tmp-858482998/rollupreg_extract-jar-with-dependencies.jar
2017-05-15 15:58:17,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-05-15 15:58:17,294 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-05-15 15:58:17,295 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2017-05-15 15:58:17,295 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-05-15 15:58:17,354 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-05-15 15:58:17,510 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2017-05-15 15:58:17,511 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2017-05-15 15:58:17,511 [JobControl] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2017-05-15 15:58:17,753 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-05-15 15:58:17,820 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-05-15 15:58:17,830 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-05-15 15:58:17,830 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-05-15 15:58:17,884 [JobControl] INFO  com.hadoop.compression.lzo.GPLNativeCodeLoader - Loaded native gpl library
2017-05-15 15:58:17,889 [JobControl] INFO  com.hadoop.compression.lzo.LzoCodec - Successfully loaded & initialized native-lzo library [hadoop-lzo rev 7a4b57bedce694048432dd5bf5b90a6c8ccdba80]
2017-05-15 15:58:17,922 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-05-15 15:58:18,525 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2017-05-15 15:58:18,692 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1494853652295_0023
2017-05-15 15:58:18,879 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2017-05-15 15:58:18,973 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1494853652295_0023
2017-05-15 15:58:19,029 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1494853652295_0023/
2017-05-15 15:58:19,030 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1494853652295_0023
2017-05-15 15:58:19,030 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases data,extract
2017-05-15 15:58:19,030 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: data[2,7],extract[3,10] C:  R:
2017-05-15 15:58:19,044 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-05-15 15:58:19,044 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1494853652295_0023]
2017-05-15 15:58:29,156 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2017-05-15 15:58:29,156 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1494853652295_0023 has failed! Stop running all dependent jobs
2017-05-15 15:58:29,157 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-05-15 15:58:29,790 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2017-05-15 15:58:29,791 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2017-05-15 15:58:29,793 [main] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2017-05-15 15:58:30,311 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2017-05-15 15:58:30,312 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2017-05-15 15:58:30,313 [main] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2017-05-15 15:58:30,465 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2017-05-15 15:58:30,467 [main] WARN  org.apache.pig.tools.pigstats.ScriptState - unable to read pigs manifest file
2017-05-15 15:58:30,472 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.7.3.2.5.0.0-1245              root    2017-05-15 15:58:16     2017-05-15 15:58:30     UNKNOWN
Failed!
Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1494853652295_0023  data,extract    MAP_ONLY        Message: Job failed!    hdfs://sandbox.hortonworks.com:8020/tmp/temp-1119775568/tmp-1619300225,
Input(s):
Failed to read data from "/pig_hdfs/input.txt"
Output(s):
Failed to produce result in "hdfs://sandbox.hortonworks.com:8020/tmp/temp-1119775568/tmp-1619300225"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1494853652295_0023
2017-05-15 15:58:30,472 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2017-05-15 15:58:30,499 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias extract
Details at logfile: /pig/pig_1494863836458.log

我知道有人抱怨

Failed to read data from "/pig_hdfs/input.txt"

但我相信这不是真正的问题。如果我不使用udf直接转储数据，我就得到了输出。所以，这不是问题所在。

apache-pig pig-udf

来源：https://stackoverflow.com/questions/43984175/unable-to-open-iterator-for-alias-alias-name

1条答案

按热度按时间

ioekq8ef1#

首先，您不需要自定义项来获得所需的输出。您可以在load语句中使用分号作为分隔符并获得所需的列。

data = LOAD 'hdfs://sandbox.hortonworks.com:8020/pig_hdfs/input.txt' USING PigStorage(';');
extract = FOREACH data GENERATE $1;
DUMP extract;

如果坚持使用自定义项，则必须将记录加载到单个字段中，然后使用自定义项。此外，自定义项不正确。应使用“；”拆分字符串s作为分隔符，它是从pig脚本传递的。

String s = (String)input.get(0);
String str1 = s.split("\\;")[1];

在pig脚本中，需要将整个记录加载到1个字段中，并在字段$0上使用自定义项。

REGISTER /pig/rollupreg_extract-jar-with-dependencies.jar;
DEFINE myudf com.company.pig.myudf;
data = LOAD 'hdfs://sandbox.hortonworks.com:8020/pig_hdfs/input.txt' AS (f1:chararray);
extract = FOREACH data GENERATE myudf($0);
DUMP extract;

展开查看全部

赞(0）回复(0）举报 2021-06-24

我来回答

无法打开别名的迭代器< alias\u name>

1条答案

相关问题

热门标签

最新问答