RocketMQ Streams Core Concept
Domain model
StreamBuilder
- An instance of StreamBuilder has 1 to N pipelines, where a pipeline represents a data processing path.
- A pipeline can contain 1 to N processing nodes, called GroupNodes.
- An instance of StreamBuilder also has a TopologyBuilder, which can construct data processors.
- Each JobId corresponds to one instance of StreamBuilder.
RocketMQStream
- An instance of RocketMQStream has a TopologyBuilder for building topologies
- An instance of RocketMQStream can instantiate 1 to N worker threads
- Each thread, represented by a WorkerThread instance, contains an engine
- An engine contains all the logic for executing data processing and includes a consumer instance, a producer instance, and a StateStore instance.
Stream Processing Instance
A Stream Processing Instance represents a process running RocketMQ Streams;
- An instance of Stream Processing contains one StreamBuilder, one RocketMQStream, one topology, and one or multiple pipelines.
StreamBuilder
StreamBuilder(jobId)
build instance;<OUT> RStream<OUT> source(topicName, deserializer)
define source topic and deserialization method;
RStream
<K> GroupedStream<K, T> keyBy(selectAction)
group the data by specific field;<O> RStream<O> map(mapperAction)
transform data one-to-one;RStream<T> filter(predictor)
filter the data<VR> RStream<T> flatMap(mapper)
transform data one-to-many;<T2> JoinedStream<T, T2> join(rightStream)
Perform a two-stream join;sink(topicName, serializer)
output the results to a specific topic;
GroupedStream
Operations on data that has the same key
<OUT> GroupedStream<K, Integer> count(selectAction)
counts the number of data entries that contain a certain field.GroupedStream<K, V> min(selectAction)
calculates the minimum value of a certain field.GroupedStream<K, V> max(selectAction)
calculates the maximum value of a certain field.GroupedStream<K, ? extends Number> sum(selectAction)
calculates the sum of a certain field.GroupedStream<K, V> filter(predictor)
filters a certain field.<OUT> GroupedStream<K, OUT> map(valueMapperAction)
performs one-to-one data transformation.<OUT> GroupedStream<K, OUT> aggregate(accumulator)
performs aggregate operations on the data, and supports second-order aggregation, such as adding data before a window triggers and calculating results when the window triggers.WindowStream<K, V> window(windowInfo)
defines a window for the stream.GroupedStream<K, V> addGraphNode(name, supplier)
adds a custom operator to the stream processing topology at a low-level interface.RStream<V> toRStream()
converts to RStream, only converting in terms of interface and not affecting the data.sink(topicName, serializer)
writes the results to a topic in a custom serialization format.
WindowStream
Operations on data that has been divided into windows
WindowStream<K, Integer> count()
counts the number of data entries in the window.WindowStream<K, V> filter(predictor)
filters the data in the window.<OUT> WindowStream<K, OUT> map(mapperAction)
performs one-to-one data transformation on the data in the window.<OUT> WindowStream<K, OUT> aggregate(aggregateAction)
performs many-to-one data transformation on the data in the window.<OUT> WindowStream<K, OUT> aggregate(accumulator)
performs aggregate operations on the data in the window, and supports second-order aggregation, such as adding data before a window triggers and calculating results when the window triggers.void sink(topicName, serializer)
writes the results to a topic in a custom serialization format.