Detailed Stream Engine Architecture
MatrixOne incorporates a built-in stream engine designed for real-time querying, processing, and enriching data stored in a series of incoming data points, known as data streams. Developers can now use SQL to define and create stream processing pipelines as a real-time data backend service. Furthermore, developers can utilize SQL to query data within streams and establish connections with non-streaming datasets, thereby further simplifying the data stack.
Technical Architecture
The technical architecture of the MatrixOne stream engine is illustrated as follows:
MatrixOne introduced the ability to create streaming tables and implemented a Kafka connector to fulfill the streaming data ingestion requirements of numerous time-series scenarios.
Connectors
Connectors facilitate connecting with external data sources, such as the Kafka connector introduced in MatrixOne 1.0.
MatrixOne supports the use of the following statement to establish a connection between connectors and external data sources:
CREATE SOURCE | SINK CONNECTOR [IF NOT EXISTS] connector_name CONNECTOR_TYPE WITH (property_name = expression [, ...]);
Here, the parameter CONNECTOR_TYPE
is used to specify the target.
Streams
A stream represents an append-only data flow akin to an unbounded table with infinite events. Each stream maps to an event group in the storage layer, such as Kafka topics or MatrixOne tables.
- External stream: A stream using an external storage layer via connectors.
- Internal stream: A stream that utilizes MatrixOne tables as the event storage.
MatrixOne supports the use of the following statement to create streams:
CREATE [OR REPLACE] [EXTERNAL] STREAM [IF NOT EXISTS] stream_name
({ column_name data_type [KEY | HEADERS | HEADER(key)] } [, ...])
WITH (property_name = expression [, ...]);
For example, you can refer to the following examples:
CREATE EXTERNAL STREAM STUDENTS (ID STRING KEY, SCORE INT)
WITH (kafka_topic = 'students_topic', value_format = 'JSON', partitions = 4);
Or:
CREATE STREAM STUDENTS (ID STRING KEY, SCORE INT)
You can also query streams and connect them with other tables and materialized views, as shown below:
SELECT * FROM STUDENTS WHERE rank > 5;
Additionally, you can insert new events, as demonstrated below:
INSERT INTO foo (ROWTIME, KEY_COL, COL_A) VALUES (1510923225000, 'key', 'A');