Overview
Stream Processing is the ability to query continuous data streams while they are still in motion. Fluent Bit implements a Streaming SQL Engine that can be used for such process.
In order to understand how Stream Processing works in Fluent Bit, we will go through a quick overview of Fluent Bit architecture and how the data goes through the pipeline.
Fluent Bit Data Pipeline
Fluent Bit collects and process logs (records) from different input sources and allows to parse and filter these records before they hit the Storage interface. Once data is processed and it’s in a safe state (either in memory or the file system), the records are routed through the proper output destinations.
Most of the phases in the pipeline are implemented through plugins: Input, Filter and Output.
The Filtering interface is good to perform specific record modifications like append or remove a key, enrich with specific metadata (e.g: Kubernetes Filter) or discard records based on specific conditions. Just after the data will not have any further modification and hits the Storage, optionally, will be redirected to the Stream Processor.
Stream Processor
The Stream Processor is an independent subsystem that check for new records hitting the Storage interface. By configuration the Stream Processor will attach to records coming from a specific Input plugin (stream) or by applying Tag and Matching rules.
Every Input instance is considered a Stream, that stream collects data and ingest records into the pipeline.
By configuring specific SQL queries (Structured Query Language), the user can perform specific tasks like key selections, filtering and data aggregation within others. Note that there is no database concept here, everything is schema-less and happens in-memory, for hence the concept of Tables as in common relational databases don’t exists.
One of the powerful features of Fluent Bit Stream Processor is that allows to create new streams of data using the results from a previous SQL query, these results are re-ingested back into the pipeline to be consumed again for the Stream Processor (if desired) or routed to output destinations such any common record by using Tag/Matching rules (tip: stream processor results can be Tagged!)