Azure Blob Storage

Azure Blob Storage is a Microsoft-managed service providing cloud storage for a variety of use cases. You can use Azure Blob Storage with Flink for reading and writing data as well in conjunction with the streaming state backends

Flink supports accessing Azure Blob Storage using both wasb:// or abfs://.

Azure recommends using abfs:// for accessing ADLS Gen2 storage accounts even though wasb:// works through backward compatibility.

abfs:// can be used for accessing the ADLS Gen2 storage accounts only. Please visit Azure documentation on how to identify ADLS Gen2 storage account.

You can use Azure Blob Storage objects like regular files by specifying paths in the following format:

  1. // WASB unencrypted access
  2. wasb://<your-container>@$<your-azure-account>.blob.core.windows.net/<object-path>
  3. // WASB SSL encrypted access
  4. wasbs://<your-container>@$<your-azure-account>.blob.core.windows.net/<object-path>
  5. // ABFS unecrypted access
  6. abfs://<your-container>@$<your-azure-account>.dfs.core.windows.net/<object-path>
  7. // ABFS SSL encrypted access
  8. abfss://<your-container>@$<your-azure-account>.dfs.core.windows.net/<object-path>

See below for how to use Azure Blob Storage in a Flink job:

  1. // Read from Azure Blob storage
  2. env.readTextFile("wasb://<your-container>@$<your-azure-account>.blob.core.windows.net/<object-path>");
  3. // Write to Azure Blob storage
  4. stream.writeAsText("wasb://<your-container>@$<your-azure-account>.blob.core.windows.net/<object-path>");
  5. // Use Azure Blob Storage as checkpoint storage
  6. Configuration config = new Configuration();
  7. config.set(CheckpointingOptions.CHECKPOINT_STORAGE, "filesystem");
  8. config.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, "wasb://<your-container>@$<your-azure-account>.blob.core.windows.net/<object-path>");
  9. env.configure(config);

Shaded Hadoop Azure Blob Storage file system

To use flink-azure-fs-hadoop, copy the respective JAR file from the opt directory to the plugins directory of your Flink distribution before starting Flink, e.g.

  1. mkdir ./plugins/azure-fs-hadoop
  2. cp ./opt/flink-azure-fs-hadoop-1.20.0.jar ./plugins/azure-fs-hadoop/

flink-azure-fs-hadoop registers default FileSystem wrappers for URIs with the wasb:// and wasbs:// (SSL encrypted access) scheme.

Credentials Configuration

WASB

Hadoop’s WASB Azure Filesystem supports configuration of credentials via the Hadoop configuration as outlined in the Hadoop Azure Blob Storage documentation. For convenience Flink forwards all Flink configurations with a key prefix of fs.azure to the Hadoop configuration of the filesystem. Consequently, the azure blob storage key can be configured in Flink configuration file via:

  1. fs.azure.account.key.<account_name>.blob.core.windows.net: <azure_storage_key>

Alternatively, the filesystem can be configured to read the Azure Blob Storage key from an environment variable AZURE_STORAGE_KEY by setting the following configuration keys in Flink configuration file.

  1. fs.azure.account.keyprovider.<account_name>.blob.core.windows.net: org.apache.flink.fs.azurefs.EnvironmentVariableKeyProvider

ABFS

Hadoop’s ABFS Azure Filesystem supports several ways of configuring authentication. Please visit the Hadoop ABFS documentation documentation on how to configure.

Azure recommends using Azure managed identity to access the ADLS Gen2 storage accounts using abfs. Please refer to Azure managed identities documentation for more details.

Please visit the page for the list of services that support Managed Identities. Flink clusters deployed in those Azure services can take advantage of Managed Identities.

Accessing ABFS using storage Keys (Discouraged)

Azure blob storage key can be configured in Flink configuration file via:

  1. fs.azure.account.key.<account_name>.dfs.core.windows.net: <azure_storage_key>