Amazon Web Services (AWS)
Amazon Web Services offers cloud computing services on which you can run Flink.
EMR: Elastic MapReduce
Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly setup a Hadoop cluster. This is the recommended way to run Flink on AWS as it takes care of setting up everything.
Standard EMR Installation
Flink is a supported application on Amazon EMR. Amazon’s documentationdescribes configuring Flink, creating and monitoring a cluster, and working with jobs.
Custom EMR Installation
Amazon EMR services are regularly updated to new releases but a version of Flink which is not availablecan be manually installed in a stock EMR cluster.
Create EMR Cluster
The EMR documentation contains examples showing how to start an EMR cluster. You can follow that guide and install any EMR release. You don’t need to install the All Applications part of the EMR release, but can stick to Core Hadoop.
NoteAccess to S3 buckets requiresconfiguration of IAM roleswhen creating an EMR cluster.
Install Flink on EMR Cluster
After creating your cluster, you can connect to the master node and install Flink:
- Go the Downloads Page and download a binary version of Flink matching the Hadoop version of your EMR cluster, e.g. Hadoop 2.7 for EMR releases 4.3.0, 4.4.0, or 4.5.0.
- Make sure all the Hadoop dependencies are in the classpath before you submit any jobs to EMR:
export HADOOP_CLASSPATH=`hadoop classpath`
- Extract the Flink distribution and you are ready to deploy Flink jobs via YARN after setting the Hadoop config directory:
HADOOP_CONF_DIR=/etc/hadoop/conf ./bin/flink run -m yarn-cluster -yn 1 examples/streaming/WordCount.jar