级联中的横向视图功能

kpbwa7wx  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(408)

我的table是这样的:
表名:mytab

+----+---------------------+
| ID |        Codes        |
+----+---------------------+
| 1  | ABC,DEF,GHI,JLK,MNO |
+----+---------------------+

我正在开发级联应用程序,该应用程序应将上表转换为以下内容:

+----+---------------------+------+
| ID |        Codes        | code |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | ABC  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | DEF  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | GHI  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | JLK  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | MNO  |
+----+---------------------+------+

如果我使用Hive,它可以很容易地做到使用侧视图。

SELECT 
    ID, Codes, Code
FROM 
    myTab LATERAL VIEW explode(Codes) codesTab AS code

但我想在级联中做同样的事情。有办法吗?

sc4hvdpw

sc4hvdpw1#

可以使用函数来完成(可能还有其他方法)。只需要为每个令牌向outputcollector添加新的元组。
比如:

import static com.google.common.base.Preconditions.checkArgument;
import cascading.flow.FlowProcess;
import cascading.operation.BaseOperation;
import cascading.operation.Function;
import cascading.operation.FunctionCall;
import cascading.tuple.Fields;
import cascading.tuple.Tuple;

public class TestLateralView extends BaseOperation<Void> implements Function<Void> {
    private static final long serialVersionUID = 1L;

    public TestLateralView(Fields fields) {
        super(fields);
        checkArgument(fields.size() == 1);
    }

    @Override
    public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Void> functionCall) {
        Tuple tuple = functionCall.getArguments().getTuple();
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < tuple.size(); i++) {
            sb.append(tuple.getString(i));
            sb.append(",");
        }

        String[] tokens = sb.toString().split(",");

        for (String token : tokens) {
            functionCall.getOutputCollector().add(new Tuple (token));
        }
    }
}

通过上述函数,我得到了预期的输出。
在程序集中,上述函数可以调用为:

pipe = new Each(pipe, CODES, new TestLateralView(CODE), Fields.ALL);

相关问题