cz.seznam.euphoria.core.client.operator.Union类的使用及代码示例

x33g5p2x  于2022-02-01 转载在 其他  
字(10.1k)|赞(0)|评价(0)|浏览(152)

本文整理了Java中cz.seznam.euphoria.core.client.operator.Union类的一些代码示例,展示了Union类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Union类的具体详情如下:
包路径:cz.seznam.euphoria.core.client.operator.Union
类名称:Union

Union介绍

[英]The union of at least two datasets of the same type.

In the context of Euphoria, a union is merely a logical view on two datasets as one. One can think of a union as a plain concatenation of at least two dataset without any guarantees about the order of the datasets' elements. Unlike in set theory, such a union has no notion of uniqueness, i.e. if the two input dataset contain the same elements, these will all appear in the output dataset (as duplicates) untouched.

Example:

Dataset xs = ...;

The "both" dataset from the above example can now be processed with an operator expecting only a single input dataset, e.g. FlatMap, which will then effectively process both "xs" and "ys".

Note: the order of the dataset does not matter. Indeed, the order of the elements themselves in the union is intentionally not specified at all.

Builders:

  1. [named] .................. give name to the operator [optional]
  2. of ....................... input datasets
  3. output ................... build output dataset
    [中]至少两个相同类型的数据集的并集。
    在欣快感的语境中,联合仅仅是将两个数据集视为一个逻辑视图。可以将联合视为至少两个数据集的简单连接,而不保证数据集元素的顺序。与集合论不同,这种并集没有唯一性的概念,也就是说,如果两个输入数据集包含相同的元素,这些元素都会在输出数据集中(作为重复项)原封不动地出现。
    例子:
Dataset xs = ...;

现在,可以使用只需要一个输入数据集(例如FlatMap)的操作员来处理上述示例中的“两者”数据集,然后该操作员将有效地处理“xs”和“ys”。
注意:数据集的顺序无关紧要。事实上,联盟中元素本身的顺序根本没有被有意指定。
####建筑商:
1.【命名】。。。。。。。。。。。。。。。。。。给操作员起名[可选]
1.关于。。。。。。。。。。。。。。。。。。。。。。。输入数据集
1.产出。。。。。。。。。。。。。。。。。。。构建输出数据集

代码示例

代码示例来源:origin: seznam/euphoria

/**
 * Starts building a nameless Union operator to view at least two datasets as one.
 *
 * @param <IN> the type of elements in the data sets
 *
 * @param dataSets at least the two data sets
 *
 * @return the next builder to complete the setup of the {@link Union} operator
 *
 * @see #named(String)
 * @see OfBuilder#of(List)
 */
@SafeVarargs
public static <IN> OutputBuilder<IN> of(Dataset<IN>... dataSets) {
 return of(Arrays.asList(dataSets));
}

代码示例来源:origin: seznam/euphoria

@Override
 public Dataset<IN> output(OutputHint... outputHints) {
  final Flow flow = dataSets.get(0).getFlow();
  final Union<IN> union = new Union<>(name, flow, dataSets, Sets.newHashSet(outputHints));
  flow.add(union);
  return union.output();
 }
}

代码示例来源:origin: seznam/euphoria

@SuppressWarnings("unchecked")
Union(String name, Flow flow, List<Dataset<IN>> dataSets, Set<OutputHint> outputHints) {
 super(name, flow);
 Preconditions.checkArgument(
   dataSets.size() > 1,
   "Union needs at least two data sets.");
 Preconditions.checkArgument(
   dataSets.stream().map(Dataset::getFlow).distinct().count() == 1,
   "Only data sets from the same flow can be passed to Union.");
 this.dataSets = dataSets;
 this.output = createOutput(dataSets.get(0), outputHints);
}

代码示例来源:origin: seznam/euphoria

@Test
public void testBuild() {
 final Flow flow = Flow.create("TEST");
 final Dataset<String> left = Util.createMockDataset(flow, 2);
 final Dataset<String> right = Util.createMockDataset(flow, 3);
 final Dataset<String> unioned = Union.named("Union1")
   .of(left, right)
   .output();
 assertEquals(flow, unioned.getFlow());
 assertEquals(1, flow.size());
 final Union union = (Union) flow.operators().iterator().next();
 assertEquals(flow, union.getFlow());
 assertEquals("Union1", union.getName());
 assertEquals(unioned, union.output());
 assertEquals(2, union.listInputs().size());
}

代码示例来源:origin: seznam/euphoria

@Test
 public void testBuild_ImplicitName() {
  final Flow flow = Flow.create("TEST");
  final Dataset<String> left = Util.createMockDataset(flow, 2);
  final Dataset<String> right = Util.createMockDataset(flow, 3);

  Union.of(left, right).output();

  final Union union = (Union) flow.operators().iterator().next();
  assertEquals("Union", union.getName());
 }
}

代码示例来源:origin: seznam/euphoria

@Override
 @SuppressWarnings("unchecked")
 public JavaRDD<?> translate(Union operator, SparkExecutorContext context) {
  final List<JavaRDD<?>> inputs = context.getInputs(operator);
  if (inputs.size() < 2) {
   throw new IllegalStateException("Union operator needs at least 2 inputs");
  }
  return inputs
    .stream()
    .reduce(
      (l, r) ->
        ((JavaRDD<Object>) l)
          .union((JavaRDD<Object>) r)
          .setName(operator.getName()))
    .orElseThrow(() -> new IllegalArgumentException("Unable to reduce inputs."));
 }
}

代码示例来源:origin: seznam/euphoria

@Test
public void testBuild_ThreeDataSet() {
 final Flow flow = Flow.create("TEST");
 final Dataset<String> first = Util.createMockDataset(flow, 1);
 final Dataset<String> second = Util.createMockDataset(flow, 2);
 final Dataset<String> third = Util.createMockDataset(flow, 3);
 final Dataset<String> unioned = Union.named("Union1")
   .of(first, second, third)
   .output();
 assertEquals(flow, unioned.getFlow());
 assertEquals(1, flow.size());
 final Union union = (Union) flow.operators().iterator().next();
 assertEquals(flow, union.getFlow());
 assertEquals("Union1", union.getName());
 assertEquals(unioned, union.output());
 assertEquals(3, union.listInputs().size());
}

代码示例来源:origin: seznam/euphoria

/**
 * Starts building a nameless Union operator to view at least two datasets as one.
 *
 * @param <IN> the type of elements in the data sets
 *
 * @param dataSets at least the two data sets
 *
 * @return the next builder to complete the setup of the {@link Union} operator
 *
 * @see #named(String)
 * @see OfBuilder#of(List)
 */
@SafeVarargs
public static <IN> OutputBuilder<IN> of(Dataset<IN>... dataSets) {
 return of(Arrays.asList(dataSets));
}

代码示例来源:origin: seznam/euphoria

@Override
 public Dataset<IN> output(OutputHint... outputHints) {
  final Flow flow = dataSets.get(0).getFlow();
  final Union<IN> union = new Union<>(name, flow, dataSets, Sets.newHashSet(outputHints));
  flow.add(union);
  return union.output();
 }
}

代码示例来源:origin: seznam/euphoria

@SuppressWarnings("unchecked")
Union(String name, Flow flow, List<Dataset<IN>> dataSets, Set<OutputHint> outputHints) {
 super(name, flow);
 Preconditions.checkArgument(
   dataSets.size() > 1,
   "Union needs at least two data sets.");
 Preconditions.checkArgument(
   dataSets.stream().map(Dataset::getFlow).distinct().count() == 1,
   "Only data sets from the same flow can be passed to Union.");
 this.dataSets = dataSets;
 this.output = createOutput(dataSets.get(0), outputHints);
}

代码示例来源:origin: seznam/euphoria

public Dataset<T> union(Dataset<T> other) {
 return new Dataset<>(Union.of(this.wrap, other.wrap).output());
}

代码示例来源:origin: seznam/euphoria

new Union<>(getName() + "::Union", flow,
  Arrays.asList(leftMap.output(), rightMap.output()));
getName() + "::ReduceStateByKey",
flow,
union.output(),
keyExtractor,
e -> e,

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3),
   Arrays.asList(4, 5, 6)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(7, 8, 9),
   Arrays.asList(10, 11, 12)));
 return Union.of(first, second).output();
}

代码示例来源:origin: seznam/euphoria

new Union<>(getName() + "::Union", flow,
  Arrays.asList(leftMap.output(), rightMap.output()));
getName() + "::ReduceStateByKey",
flow,
union.output(),
keyExtractor,
e -> e,

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3),
   Arrays.asList(4, 5, 6)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(7, 8, 9),
   Arrays.asList(10, 11, 12)));
 return Union.of(first, second).output();
}

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(4, 5, 6),
   Arrays.asList(7, 8, 9)));
 final Dataset<Integer> third = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(10, 11, 12),
   Arrays.asList(13, 14, 15),
   Arrays.asList(16, 17, 18)));
 return Union.of(first, second, third).output();
}

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3),
   Arrays.asList(4, 5, 6)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(7, 8, 9),
   Arrays.asList(10, 11, 12)));
 final Dataset<Integer> third = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(13, 14, 15),
   Arrays.asList(16, 17, 18)));
 return Union.of(first, second, third).output();
}

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(4, 5, 6),
   Arrays.asList(7, 8, 9)));
 final Dataset<Integer> third = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(10, 11, 12),
   Arrays.asList(13, 14, 15),
   Arrays.asList(16, 17, 18)));
 return Union.of(first, second, third).output();
}

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3),
   Arrays.asList(4, 5, 6)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(7, 8, 9),
   Arrays.asList(10, 11, 12)));
 final Dataset<Integer> third = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(13, 14, 15),
   Arrays.asList(16, 17, 18)));
 return Union.of(first, second, third).output();
}

代码示例来源:origin: seznam/euphoria

@Override
public Dataset<Integer> getOutput(Flow flow, boolean bounded) {
 final Dataset<Integer> first = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(1, 2, 3)));
 final Dataset<Integer> second = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(4, 5, 6)));
 final Dataset<Integer> third = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(7, 8, 9)));
 final Dataset<Integer> fourth = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(10, 11, 12)));
 final Dataset<Integer> fifth = flow.createInput(ListDataSource.of(bounded,
   Arrays.asList(13, 14, 15)));
 return Union.of(first, second, third, fourth, fifth).output();
}

相关文章