Streaming Connectors
Predefined Sources and Sinks
A few basic data sources and sinks are built into Flink and are always available.The predefined data sources include reading from files, directories, and sockets, andingesting data from collections and iterators.The predefined data sinks support writing to files, to stdout and stderr, and to sockets.
Bundled Connectors
Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:
- Apache Kafka (source/sink)
- Apache Cassandra (sink)
- Amazon Kinesis Streams (source/sink)
- Elasticsearch (sink)
- Hadoop FileSystem (sink)
- RabbitMQ (source/sink)
- Apache NiFi (source/sink)
- Twitter Streaming API (source)
- Google PubSub (source/sink)
Keep in mind that to use one of these connectors in an application, additional third partycomponents are usually required, e.g. servers for the data stores or message queues.Note also that while the streaming connectors listed in this section are part of theFlink project and are included in source releases, they are not included in the binary distributions. Further instructions can be found in the corresponding subsections.
Connectors in Apache Bahir
Additional streaming connectors for Flink are being released through Apache Bahir, including:
- Apache ActiveMQ (source/sink)
- Apache Flume (sink)
- Redis (sink)
- Akka (sink)
- Netty (source)
Other Ways to Connect to Flink
Data Enrichment via Async I/O
Using a connector isn’t the only way to get data in and out of Flink.One common pattern is to query an external database or web service in a Map
or FlatMap
in order to enrich the primary datastream.Flink offers an API for Asynchronous I/Oto make it easier to do this kind of enrichment efficiently and robustly.
Queryable State
When a Flink application pushes a lot of data to an external data store, thiscan become an I/O bottleneck.If the data involved has many fewer reads than writes, a better approach can befor an external application to pull from Flink the data it needs.The Queryable State interfaceenables this by allowing the state being managed by Flink to be queried on demand.