keyedprocessfunction排序

olhwl3o2 于 2021-07-15 发布在 Flink

关注(0)|答案(1)|浏览(508)

我对Flink还不熟悉，我想知道Flink是怎么安排电话的 processElement() 在它的 KeyedProcessFunction 并行下的抽象。考虑以下生成部分和流的示例：

package sample
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.streaming.api.functions.KeyedProcessFunction
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment, createTypeInformation}
import org.apache.flink.util.Collector
object Playground {
  case class Record(groupId: String, score: Int) {}
  def main(args: Array[String]): Unit = {
    // 1. Create the environment
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironment()
    env.setParallelism(10)
    // 2. Source
    val record1 = Record("groupX", 1)
    val record2 = Record("groupX", 2)
    val record3 = Record("groupX", 3)
    val records: DataStream[Record] = env.fromElements(record1, record2, record3, record1, record2, record3)
    // 3. Application Logic
    val partialSums: DataStream[Int] = records
      .keyBy(record => record.groupId)
      .process(new KeyedProcessFunction[String, Record, Int] {
        // Store partial sum of score for Records seen
        lazy val partialSum: ValueState[Int] = getRuntimeContext.getState(
          new ValueStateDescriptor[Int]("partialSum", classOf[Int]))
        // Ingest new record
        override
        def processElement(value: Record,
                           ctx: KeyedProcessFunction[String, Record, Int]#Context,
                           out: Collector[Int]): Unit =
        {
          val currentSum: Int = partialSum.value()
          partialSum.update(currentSum + value.score)
          out.collect(partialSum.value())
        }
      })
    // 4. Sink
    partialSums.print()
    // 5. Build JobGraph and execute
    env.execute("sample-job")
  }
}

我希望它的输出是流： 1, 3, 6, 7, 9, 12 . 事实上，就在这里。
我们是否可以安全地假设情况总是这样，特别是在读取具有大量并行性的源代码时？

apache-flink flink-streaming

来源：https://stackoverflow.com/questions/67083849/flink-keyedprocessfunction-ordering

1条答案

按热度按时间

ryoqjall1#

在您的示例中，顺序在每个键中都有保证。因为只有一把钥匙，你永远都会得到 1, 3, 6, 7, 9, 12 .
当您从并行度大于1的源读取时，各种源示例将相互竞争。当来自两个或多个源的流被连接（例如，通过keyby、union、rebalance等）时，结果是不确定的（但是来自每个源的事件将保持它们的相对顺序）。
例如，如果你有

stream X: 1, 2, 3, 4
stream Y: a, b, c, d

然后把这两条溪流汇集在一起 1, 2, 3, 4, a, b, c, d ，或 a, b, 1, 2, 3, c, 4, d 等等。

赞(0）回复(0）举报 2021-07-15

我来回答

keyedprocessfunction排序

1条答案

相关问题

热门标签

最新问答