层叠教程字数计算示例错误

cpjpxq1n  于 2021-06-04  发布在  Hadoop
关注(0)|答案(2)|浏览(376)

我正在学习级联。现在我在它的官方网站上看第二个教程,是关于工作计数的例子。我从中复制代码并尝试运行,它总是给我以下错误:

Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [[token][com.starscriber.cascadingtest.Main.main(Main.java:44)] 
unable to resolve argument selector: [{1}:'text'], with incoming: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']] at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:576)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:263)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at com.starscriber.cascadingtest.Main.main(Main.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Caused by: cascading.pipe.OperatorException: [token][com.starscriber.cascadingtest.Main.main(Main.java:44)] 
unable to resolve argument selector: [{1}:'text'], with incoming: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']
at cascading.pipe.Operator.resolveArgumentSelector(Operator.java:345)
at cascading.pipe.Each.outgoingScopeFor(Each.java:368)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:628)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:610)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:248)
... 8 more

Caused by: cascading.tuple.FieldsResolverException: 
could not select fields: [{1}:'text'], from: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']
at cascading.tuple.Fields.indexOf(Fields.java:1008)
at cascading.tuple.Fields.select(Fields.java:1064)
at cascading.pipe.Operator.resolveArgumentSelector(Operator.java:341)
... 12 more

怎么会??我复制完全相同的代码,这是从它的官方github,没有改变任何东西。。。

String docPath = args[0];
String wcPath = args[1];

Properties properties = new Properties();          
AppProps.setApplicationJarClass(properties, Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

// create source and sink taps
Tap docTap = new Hfs(new TextDelimited(true, "\t"), docPath);
Tap wcTap = new Hfs(new TextDelimited(true, "\t"), wcPath);

// specify a regex operation to split the "document" text lines into a token stream
Fields token = new Fields("token");
Fields text = new Fields("text");
RegexSplitGenerator splitter = new RegexSplitGenerator(token, "[ \\[\\]\\(\\),.]");
// only returns "token"
Pipe docPipe = new Each("token", text, splitter, Fields.RESULTS);

// determine the word counts
Pipe wcPipe = new Pipe("wc", docPipe);
wcPipe = new GroupBy(wcPipe, token);
wcPipe = new Every(wcPipe, Fields.ALL, new Count(), Fields.ALL);

// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef()
            .setName("wc")
            .addSource(docPipe, docTap)
            .addTailSink(wcPipe, wcTap);

// write a DOT file and run the flow
Flow wcFlow = flowConnector.connect(flowDef);
wcFlow.writeDOT("dot/wc.dot");
wcFlow.complete();

问题出在哪里??
这是输入文件:

doc01        A rain shadow is a dry area on the lee back side of a mountainous area.
doc02        This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover.
doc03        A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain.
doc04        This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley.
doc05        Two Women. Secrets. A Broken Land. [DVD Australia]
a0zr77ik

a0zr77ik1#

正如其他人已经提到的,您需要具有示例所期望的相同头。不要复制代码,而是尝试克隆存储库,这样就不会出现与文件格式相关的任何错误

at0kjp5o

at0kjp5o2#

检查输入文件中docid和text两个字段之间是否有tab。程序要求两个字段以制表符分隔,但在您的情况下,它将整行读取到一个字段中。

相关问题