本文整理了Java中org.apache.beam.sdk.transforms.Filter
类的一些代码示例,展示了Filter
类的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Filter
类的具体详情如下:
包路径:org.apache.beam.sdk.transforms.Filter
类名称:Filter
[英]PTransforms for filtering from a PCollection the elements satisfying a predicate, or satisfying an inequality with a given value based on the elements' natural ordering.
[中]p用于从PCollection筛选满足谓词的元素,或基于元素的自然顺序满足具有给定值的不等式的元素的转换。
代码示例来源:origin: jbonofre/beam-samples
public static PCollection<String> filterByCountry(PCollection<String> data, final String country) {
return data.apply("FilterByCountry", Filter.by(new SerializableFunction<String, Boolean>() {
public Boolean apply(String row) {
return getCountry(row).equals(country);
}
}));
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
@Test
public void testDisplayData() {
assertThat(DisplayData.from(Filter.lessThan(123)), hasDisplayItem("predicate", "x < 123"));
assertThat(DisplayData.from(Filter.lessThanEq(234)), hasDisplayItem("predicate", "x ≤ 234"));
assertThat(DisplayData.from(Filter.greaterThan(345)), hasDisplayItem("predicate", "x > 345"));
assertThat(DisplayData.from(Filter.greaterThanEq(456)), hasDisplayItem("predicate", "x ≥ 456"));
assertThat(DisplayData.from(Filter.equal(567)), hasDisplayItem("predicate", "x == 567"));
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a new {@link Filter} {@link PTransform} that's like this {@link PTransform} but with
* the specified description for {@link DisplayData}. Does not modify this {@link PTransform}.
*/
Filter<T> described(String description) {
return new Filter<>(predicate, description);
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a {@code PTransform} that takes an input {@code PCollection<T>} and returns a {@code
* PCollection<T>} with elements that equals to a given value. Elements must be {@code
* Comparable}.
*
* <p>Example of use:
*
* <pre>{@code
* PCollection<Integer> listOfNumbers = ...;
* PCollection<Integer> equalNumbers = listOfNumbers.apply(Filter.equal(1000));
* }</pre>
*
* <p>See also {@link #greaterThan}, {@link #lessThan}, {@link #lessThanEq} and {@link
* #greaterThanEq}, which return elements satisfying various inequalities with the specified value
* based on the elements' natural ordering.
*
* <p>See also {@link #by}, which returns elements that satisfy the given predicate.
*/
public static <T extends Comparable<T>> Filter<T> equal(final T value) {
return by((SerializableFunction<T, Boolean>) input -> input.compareTo(value) == 0)
.described(String.format("x == %s", value));
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
@Test
@Category(NeedsRunner.class)
public void testFilterGreaterThan() {
PCollection<Integer> output =
p.apply(Create.of(1, 2, 3, 4, 5, 6, 7)).apply(Filter.greaterThan(4));
PAssert.that(output).containsInAnyOrder(5, 6, 7);
p.run();
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a {@code PTransform} that takes an input {@code PCollection<T>} and returns a {@code
* PCollection<T>} with elements that are less than or equal to a given value, based on the
* elements' natural ordering. Elements must be {@code Comparable}.
*
* <p>Example of use:
*
* <pre>{@code
* PCollection<Integer> listOfNumbers = ...;
* PCollection<Integer> smallOrEqualNumbers =
* listOfNumbers.apply(Filter.lessThanEq(10));
* }</pre>
*
* <p>See also {@link #lessThan}, {@link #greaterThanEq}, {@link #equal} and {@link #greaterThan},
* which return elements satisfying various inequalities with the specified value based on the
* elements' natural ordering.
*
* <p>See also {@link #by}, which returns elements that satisfy the given predicate.
*/
public static <T extends Comparable<T>> Filter<T> lessThanEq(final T value) {
return by((SerializableFunction<T, Boolean>) input -> input.compareTo(value) <= 0)
.described(String.format("x ≤ %s", value));
}
代码示例来源:origin: org.apache.beam/beam-runners-direct-java
@Before
public void setup() {
createdInts = p.apply("createdInts", Create.of(1, 2, 3));
filtered = createdInts.apply("filtered", Filter.greaterThan(1));
filteredTimesTwo =
filtered.apply(
"timesTwo",
ParDo.of(
new DoFn<Integer, Integer>() {
@ProcessElement
public void processElement(ProcessContext c) throws Exception {
c.output(c.element() * 2);
}
}));
keyed = createdInts.apply("keyed", WithKeys.of("MyKey"));
intsToFlatten = p.apply("intsToFlatten", Create.of(-1, 256, 65535));
PCollectionList<Integer> preFlatten = PCollectionList.of(createdInts).and(intsToFlatten);
flattened = preFlatten.apply("flattened", Flatten.pCollections());
clock = MockClock.fromInstant(new Instant(1000));
DirectGraphs.performDirectOverrides(p);
graph = DirectGraphs.getGraph(p);
manager = WatermarkManager.create(clock, graph, AppliedPTransform::getFullName);
bundleFactory = ImmutableListBundleFactory.create();
}
代码示例来源:origin: com.google.cloud.genomics/google-genomics-dataflow
/**
* Compute a PCollection of reference allele frequencies for SNPs of interest.
* The SNPs all have only a single alternate allele, and neither the
* reference nor the alternate allele have a population frequency < minFreq.
* The results are returned in a PCollection indexed by Position.
*
* @param variants a set of variant calls for a reference population
* @param minFreq the minimum allele frequency for the set
* @return a PCollection mapping Position to AlleleCounts
*/
static PCollection<KV<Position, AlleleFreq>> getFreq(
PCollection<Variant> variants, double minFreq) {
return variants.apply("PassingFilter", Filter.by(VariantFunctions.IS_PASSING))
.apply("OnChromosomeFilter", Filter.by(VariantFunctions.IS_ON_CHROMOSOME))
.apply("NotLowQualityFilter", Filter.by(VariantFunctions.IS_NOT_LOW_QUALITY))
.apply("SNPFilter", Filter.by(VariantFunctions.IS_SINGLE_ALTERNATE_SNP))
.apply(ParDo.of(new GetAlleleFreq()))
.apply(Filter.by(new FilterFreq(minFreq)));
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a {@code PTransform} that takes an input {@link PCollection} and returns a {@link
* PCollection} with elements that are less than a given value, based on the elements' natural
* ordering. Elements must be {@code Comparable}.
*
* <p>Example of use:
*
* <pre>{@code
* PCollection<Integer> listOfNumbers = ...;
* PCollection<Integer> smallNumbers =
* listOfNumbers.apply(Filter.lessThan(10));
* }</pre>
*
* <p>See also {@link #lessThanEq}, {@link #greaterThanEq}, {@link #equal} and {@link
* #greaterThan}, which return elements satisfying various inequalities with the specified value
* based on the elements' natural ordering.
*
* <p>See also {@link #by}, which returns elements that satisfy the given predicate.
*/
public static <T extends Comparable<T>> Filter<T> lessThan(final T value) {
return by((SerializableFunction<T, Boolean>) input -> input.compareTo(value) < 0)
.described(String.format("x < %s", value));
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a {@code PTransform} that takes an input {@code PCollection<T>} and returns a {@code
* PCollection<T>} with elements that satisfy the given predicate. The predicate must be a {@code
* SerializableFunction<T, Boolean>}.
*
* <p>Example of use:
*
* <pre>{@code
* PCollection<String> wordList = ...;
* PCollection<String> longWords =
* wordList.apply(Filter.by(new MatchIfWordLengthGT(6)));
* }</pre>
*
* <p>See also {@link #lessThan}, {@link #lessThanEq}, {@link #greaterThan}, {@link
* #greaterThanEq}, which return elements satisfying various inequalities with the specified value
* based on the elements' natural ordering.
*/
public static <T, PredicateT extends SerializableFunction<T, Boolean>> Filter<T> by(
PredicateT predicate) {
return new Filter<>(predicate);
}
代码示例来源:origin: googlegenomics/dataflow-java
/**
* Compute a PCollection of reference allele frequencies for SNPs of interest.
* The SNPs all have only a single alternate allele, and neither the
* reference nor the alternate allele have a population frequency < minFreq.
* The results are returned in a PCollection indexed by Position.
*
* @param variants a set of variant calls for a reference population
* @param minFreq the minimum allele frequency for the set
* @return a PCollection mapping Position to AlleleCounts
*/
static PCollection<KV<Position, AlleleFreq>> getFreq(
PCollection<Variant> variants, double minFreq) {
return variants.apply("PassingFilter", Filter.by(VariantFunctions.IS_PASSING))
.apply("OnChromosomeFilter", Filter.by(VariantFunctions.IS_ON_CHROMOSOME))
.apply("NotLowQualityFilter", Filter.by(VariantFunctions.IS_NOT_LOW_QUALITY))
.apply("SNPFilter", Filter.by(VariantFunctions.IS_SINGLE_ALTERNATE_SNP))
.apply(ParDo.of(new GetAlleleFreq()))
.apply(Filter.by(new FilterFreq(minFreq)));
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a {@code PTransform} that takes an input {@code PCollection<T>} and returns a {@code
* PCollection<T>} with elements that are greater than a given value, based on the elements'
* natural ordering. Elements must be {@code Comparable}.
*
* <p>Example of use:
*
* <pre>{@code
* PCollection<Integer> listOfNumbers = ...;
* PCollection<Integer> largeNumbers =
* listOfNumbers.apply(Filter.greaterThan(1000));
* }</pre>
*
* <p>See also {@link #greaterThanEq}, {@link #lessThan}, {@link #equal} and {@link #lessThanEq},
* which return elements satisfying various inequalities with the specified value based on the
* elements' natural ordering.
*
* <p>See also {@link #by}, which returns elements that satisfy the given predicate.
*/
public static <T extends Comparable<T>> Filter<T> greaterThan(final T value) {
return by((SerializableFunction<T, Boolean>) input -> input.compareTo(value) > 0)
.described(String.format("x > %s", value));
}
代码示例来源:origin: com.google.cloud.genomics/google-genomics-dataflow
reads.apply("IsOnChromosome", Filter.by(ReadFunctions.IS_ON_CHROMOSOME))
.apply("IsNotQCFailure", Filter.by(ReadFunctions.IS_NOT_QC_FAILURE))
.apply("IsNotDuplicate", Filter.by(ReadFunctions.IS_NOT_DUPLICATE))
.apply("IsProperPlacement", Filter.by(ReadFunctions.IS_PROPER_PLACEMENT))
.apply(ParDo.of(new SplitReads()))
.apply(Filter.by(new SampleReads(samplingFraction, samplingPrefix)));
代码示例来源:origin: org.apache.beam/beam-sdks-java-core
/**
* Returns a {@code PTransform} that takes an input {@code PCollection<T>} and returns a {@code
* PCollection<T>} with elements that are greater than or equal to a given value, based on the
* elements' natural ordering. Elements must be {@code Comparable}.
*
* <p>Example of use:
*
* <pre>{@code
* PCollection<Integer> listOfNumbers = ...;
* PCollection<Integer> largeOrEqualNumbers =
* listOfNumbers.apply(Filter.greaterThanEq(1000));
* }</pre>
*
* <p>See also {@link #greaterThan}, {@link #lessThan}, {@link #equal} and {@link #lessThanEq},
* which return elements satisfying various inequalities with the specified value based on the
* elements' natural ordering.
*
* <p>See also {@link #by}, which returns elements that satisfy the given predicate.
*/
public static <T extends Comparable<T>> Filter<T> greaterThanEq(final T value) {
return by((SerializableFunction<T, Boolean>) input -> input.compareTo(value) >= 0)
.described(String.format("x ≥ %s", value));
}
代码示例来源:origin: googlegenomics/dataflow-java
reads.apply("IsOnChromosome", Filter.by(ReadFunctions.IS_ON_CHROMOSOME))
.apply("IsNotQCFailure", Filter.by(ReadFunctions.IS_NOT_QC_FAILURE))
.apply("IsNotDuplicate", Filter.by(ReadFunctions.IS_NOT_DUPLICATE))
.apply("IsProperPlacement", Filter.by(ReadFunctions.IS_PROPER_PLACEMENT))
.apply(ParDo.of(new SplitReads()))
.apply(Filter.by(new SampleReads(samplingFraction, samplingPrefix)));
代码示例来源:origin: jbonofre/beam-samples
@Override
public PCollection<String> expand(PCollection<String> inputCollection) {
PCollection<String> compositeKeys =
inputCollection.apply("ExtractCompositeKey", ParDo.of(new DoFn<String, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
c.output(getCompositeKey(c.element()));
}
})).apply("FilterValidCompositeKeys", Filter.by(new SerializableFunction<String, Boolean>() {
public Boolean apply(String input) {
return (!input.equals("NA"));
}
}));
PCollection<KV<String, Long>> compositesEventsPairs =
compositeKeys.apply("GetEventsByCompositeKey", Count.<String>perElement());
PCollection<String> result = compositesEventsPairs.apply("FormatOutput", MapElements.via(
new SimpleFunction<KV<String, Long>, String>() {
@Override
public String apply(KV<String, Long> kv) {
StringBuilder str = new StringBuilder();
String[] split = kv.getKey().split("_");
String country = split[0];
String subject = split[1];
Long eventsNb = kv.getValue();
str.append(country).append(" ").append(subject).append(" ").append(eventsNb);
return str.toString();
}
}));
return result;
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-io-redis
@Override
public PCollection<KV<String, String>> expand(PCollection<KV<String, String>> input) {
// reparallelize mimics the same behavior as in JdbcIO
// breaking fusion
PCollectionView<Iterable<KV<String, String>>> empty =
input
.apply("Consume", Filter.by(SerializableFunctions.constant(false)))
.apply(View.asIterable());
PCollection<KV<String, String>> materialized =
input.apply(
"Identity",
ParDo.of(
new DoFn<KV<String, String>, KV<String, String>>() {
@ProcessElement
public void processElement(ProcessContext context) {
context.output(context.element());
}
})
.withSideInputs(empty));
return materialized.apply(Reshuffle.viaRandomKey());
}
}
代码示例来源:origin: org.apache.beam/beam-sdks-java-io-jdbc
@Override
public PCollection<T> expand(PCollection<T> input) {
// See https://issues.apache.org/jira/browse/BEAM-2803
// We use a combined approach to "break fusion" here:
// (see https://cloud.google.com/dataflow/service/dataflow-service-desc#preventing-fusion)
// 1) force the data to be materialized by passing it as a side input to an identity fn,
// then 2) reshuffle it with a random key. Initial materialization provides some parallelism
// and ensures that data to be shuffled can be generated in parallel, while reshuffling
// provides perfect parallelism.
// In most cases where a "fusion break" is needed, a simple reshuffle would be sufficient.
// The current approach is necessary only to support the particular case of JdbcIO where
// a single query may produce many gigabytes of query results.
PCollectionView<Iterable<T>> empty =
input
.apply("Consume", Filter.by(SerializableFunctions.constant(false)))
.apply(View.asIterable());
PCollection<T> materialized =
input.apply(
"Identity",
ParDo.of(
new DoFn<T, T>() {
@ProcessElement
public void process(ProcessContext c) {
c.output(c.element());
}
})
.withSideInputs(empty));
return materialized.apply(Reshuffle.viaRandomKey());
}
}
代码示例来源:origin: GoogleCloudPlatform/DataflowTemplates
@Override
public PCollection<T> expand(PCollection<T> input) {
// See https://issues.apache.org/jira/browse/BEAM-2803
// We use a combined approach to "break fusion" here:
// (see https://cloud.google.com/dataflow/service/dataflow-service-desc#preventing-fusion)
// 1) force the data to be materialized by passing it as a side input to an identity fn,
// then 2) reshuffle it with a random key. Initial materialization provides some parallelism
// and ensures that data to be shuffled can be generated in parallel, while reshuffling
// provides perfect parallelism.
// In most cases where a "fusion break" is needed, a simple reshuffle would be sufficient.
// The current approach is necessary only to support the particular case of JdbcIO where
// a single query may produce many gigabytes of query results.
PCollectionView<Iterable<T>> empty =
input
.apply("Consume", Filter.by(SerializableFunctions.constant(false)))
.apply(View.asIterable());
PCollection<T> materialized =
input.apply(
"Identity",
ParDo.of(
new DoFn<T, T>() {
@ProcessElement
public void process(ProcessContext c) {
c.output(c.element());
}
})
.withSideInputs(empty));
return materialized.apply(Reshuffle.viaRandomKey());
}
}
代码示例来源:origin: org.talend.components/processing-runtime
PCollection<IndexedRecord> output = mainPCollection.apply(ctx.getPTransformName(), Filter.by(predicate));
ctx.putPCollectionByLinkName(hasFlow ? flowLink : rejectLink, output);
} else {
内容来源于网络,如有侵权,请联系作者删除!