Upgrading from 1.10 to 2

Apache Airflow 2 is a major release and the purpose of this document is to assist users to migrate from Airflow 1.10.x to Airflow 2

Step 1: Switch to Python 3

Airflow 1.10 was the last release series to support Python 2. Airflow 2.0.0 requires Python 3.6+ and has been tested with Python versions 3.6, 3.7 and 3.8. Python 3.9 support was added from Airflow 2.1.2.

Airflow 2.3.0 dropped support for Python 3.6. It’s tested with Python 3.7, 3.8, 3.9 and 3.10.

If you have a specific task that still requires Python 2 then you can use the @task.virtualenv, @task.docker or @task.kubernetes decorators for this.

For a list of breaking changes between Python 2 and Python 3, please refer to this handy blog from the CouchBaseDB team.

Step 2: Upgrade to 1.10.15

To minimize friction for users upgrading from Airflow 1.10 to Airflow 2.0 and beyond, Airflow 1.10.15 a.k.a “bridge release” has been created. This is the final 1.10 feature release. Airflow 1.10.15 includes support for various features that have been backported from Airflow 2.0 to make it easy for users to test their Airflow environment before upgrading to Airflow 2.0.

We strongly recommend that all users upgrading to Airflow 2.0, first upgrade to Airflow 1.10.15 and test their Airflow deployment and only then upgrade to Airflow 2.0. Airflow 1.10.x reached end of life on 17 June 2021. No new Airflow 1.x versions will be released.

Features in 1.10.15 include:

1. Most breaking DAG and architecture changes of Airflow 2.0 have been backported to Airflow 1.10.15. This backward-compatibility does not mean that 1.10.15 will process these DAGs the same way as Airflow 2.0. Instead, this means that most Airflow 2.0 compatible DAGs will work in Airflow 1.10.15. This backport will give users time to modify their DAGs over time without any service disruption.

2. We have also backported the updated Airflow 2.0 CLI commands to Airflow 1.10.15, so that users can modify their scripts to be compatible with Airflow 2.0 before the upgrade.

3. For users of the KubernetesExecutor, we have backported the pod_template_file capability for the KubernetesExecutor as well as a script that will generate a pod_template_file based on your airflow.cfg settings. To generate this file simply run the following command:

  1. airflow generate_pod_template -o <output file path>

Once you have performed this step, simply write out the file path to this file in the pod_template_file config of the kubernetes_executor section of your airflow.cfg

Note

Prior to airflow version 2.4.2, the kubernetes_executor section was called kubernetes.

Step 3: Run the Upgrade check scripts

After upgrading to Airflow 1.10.15, we recommend that you install the “upgrade check” scripts. These scripts will read through your airflow.cfg and all of your DAGs and will give a detailed report of all changes required before upgrading. We are testing this script diligently, and our goal is that any Airflow setup that can pass these tests will be able to upgrade to 2.0 without any issues.

  1. pip install apache-airflow-upgrade-check

Once this is installed, please run the upgrade check script.

  1. airflow upgrade_check

More details about this process are here Upgrade Check Scripts.

Step 4: Switch to Backport Providers

Now that you are set up in Airflow 1.10.15 with Python a 3.6+ environment, you are ready to start porting your DAGs to Airflow 2.0 compliance!

The most important step in this transition is also the easiest step to do in pieces. All Airflow 2.0 operators are backwards compatible with Airflow 1.10 using the backport provider packages. In your own time, you can transition to using these backport-providers by pip installing the provider via PyPI and changing the import path.

For example: While historically you might have imported the DockerOperator in this fashion:

  1. from airflow.operators.docker_operator import DockerOperator

You would now run this command to install the provider:

  1. pip install apache-airflow-backport-providers-docker

and then import the operator with this path:

  1. from airflow.providers.docker.operators.docker import DockerOperator

Please note that the backport provider packages are just backports of the provider packages compatible with Airflow 2.0. For example:

  1. pip install 'apache-airflow[docker]'

automatically installs the apache-airflow-providers-docker package. But you can manage/upgrade/remove provider packages separately from the Airflow core.

After you upgrade to Apache Airflow 2.0, those provider packages are installed automatically when you install Airflow with extras. Several of the providers (http, ftp, sqlite, imap) will also be installed automatically when you install Airflow even without extras. You can read more about providers at Provider packages.

Step 5: Upgrade Airflow DAGs

Change to undefined variable handling in templates

Prior to Airflow 2.0 Jinja Templates would permit the use of undefined variables. They would render as an empty string, with no indication to the user an undefined variable was used. With this release, any template rendering involving undefined variables will fail the task, as well as displaying an error in the UI when rendering.

The behavior can be reverted when instantiating a DAG.

  1. import jinja2
  2. dag = DAG("simple_dag", template_undefined=jinja2.Undefined)

Alternatively, it is also possible to override each Jinja Template variable on an individual basis by using the | default Jinja filter as shown below.

  1. {{a | default(1)}}

Changes to the KubernetesPodOperator

Much like the KubernetesExecutor, the KubernetesPodOperator will no longer take Airflow custom classes and will instead expect either a pod_template yaml file, or kubernetes.client.models objects.

The one notable exception is that we will continue to support the airflow.kubernetes.secret.Secret class.

Whereas previously a user would import each individual class to build the pod as so:

  1. from airflow.kubernetes.pod import Port
  2. from airflow.kubernetes.volume import Volume
  3. from airflow.kubernetes.secret import Secret
  4. from airflow.kubernetes.volume_mount import VolumeMount
  5. volume_config = {"persistentVolumeClaim": {"claimName": "test-volume"}}
  6. volume = Volume(name="test-volume", configs=volume_config)
  7. volume_mount = VolumeMount("test-volume", mount_path="/root/mount_file", sub_path=None, read_only=True)
  8. port = Port("http", 80)
  9. secret_file = Secret("volume", "/etc/sql_conn", "airflow-secrets", "sql_alchemy_conn")
  10. secret_env = Secret("env", "SQL_CONN", "airflow-secrets", "sql_alchemy_conn")
  11. k = KubernetesPodOperator(
  12. namespace="default",
  13. image="ubuntu:16.04",
  14. cmds=["bash", "-cx"],
  15. arguments=["echo", "10"],
  16. labels={"foo": "bar"},
  17. secrets=[secret_file, secret_env],
  18. ports=[port],
  19. volumes=[volume],
  20. volume_mounts=[volume_mount],
  21. name="airflow-test-pod",
  22. task_id="task",
  23. affinity=affinity,
  24. is_delete_operator_pod=True,
  25. hostnetwork=False,
  26. tolerations=tolerations,
  27. configmaps=configmaps,
  28. init_containers=[init_container],
  29. priority_class_name="medium",
  30. )

Now the user can use the kubernetes.client.models class as a single point of entry for creating all k8s objects.

  1. from kubernetes.client import models as k8s
  2. from airflow.kubernetes.secret import Secret
  3. configmaps = ["test-configmap-1", "test-configmap-2"]
  4. volume = k8s.V1Volume(
  5. name="test-volume",
  6. persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name="test-volume"),
  7. )
  8. port = k8s.V1ContainerPort(name="http", container_port=80)
  9. secret_file = Secret("volume", "/etc/sql_conn", "airflow-secrets", "sql_alchemy_conn")
  10. secret_env = Secret("env", "SQL_CONN", "airflow-secrets", "sql_alchemy_conn")
  11. secret_all_keys = Secret("env", None, "airflow-secrets-2")
  12. volume_mount = k8s.V1VolumeMount(
  13. name="test-volume", mount_path="/root/mount_file", sub_path=None, read_only=True
  14. )
  15. k = KubernetesPodOperator(
  16. namespace="default",
  17. image="ubuntu:16.04",
  18. cmds=["bash", "-cx"],
  19. arguments=["echo", "10"],
  20. labels={"foo": "bar"},
  21. secrets=[secret_file, secret_env],
  22. ports=[port],
  23. volumes=[volume],
  24. volume_mounts=[volume_mount],
  25. name="airflow-test-pod",
  26. task_id="task",
  27. is_delete_operator_pod=True,
  28. hostnetwork=False,
  29. )

We decided to keep the Secret class as users seem to really like that simplifies the complexity of mounting Kubernetes secrets into workers.

For a more detailed list of changes to the KubernetesPodOperator API, please read the section in the Appendix titled “Changed Parameters for the KubernetesPodOperator”

Change default value for dag_run_conf_overrides_params

DagRun configuration dictionary will now by default overwrite params dictionary. If you pass some key-value pairs through airflow dags backfill -c or airflow dags trigger -c, the key-value pairs will override the existing ones in params. You can revert this behaviour by setting dag_run_conf_overrides_params to False in your airflow.cfg.

DAG discovery safe mode is now case insensitive

When DAG_DISCOVERY_SAFE_MODE is active, Airflow will now filter all files that contain the string airflow and dag in a case insensitive mode. This is being changed to better support the new @dag decorator.

Change to Permissions

The DAG-level permission actions, can_dag_read and can_dag_edit are deprecated as part of Airflow 2.0. They are being replaced with can_read and can_edit. When a role is given DAG-level access, the resource name (or “view menu”, in Flask App-Builder parlance) will now be prefixed with DAG:. So the action can_dag_read on example_dag_id, is now represented as can_read on DAG:example_dag_id. There is a special view called DAGs (it was called all_dags in versions 1.10.x) which allows the role to access all the DAGs. The default Admin, Viewer, User, Op roles can all access the DAGs view.

As part of running ``airflow db upgrade``, existing permissions will be migrated for you.

When DAGs are initialized with the access_control variable set, any usage of the old permission names will automatically be updated in the database, so this won’t be a breaking change. A DeprecationWarning will be raised.

Drop legacy UI in favor of FAB RBAC UI

Warning

Breaking change

Previously we were using two versions of the UI:

  • non-RBAC UI

  • Flask App Builder RBAC UI

This was difficult to maintain, because it meant we had to implement/update features in two places. With this release, we have removed the older UI in favor of the Flask App Builder RBAC UI, reducing a huge maintenance burden. There is no longer a need to set the RBAC UI explicitly in the configuration, as it is the only default UI.

If you previously used non-RBAC UI, you have to switch to the new RBAC-UI and create users to be able to access Airflow’s webserver. For more details on CLI to create users see Command Line Interface and Environment Variables Reference

Please note that custom auth backends will need re-writing to target new FAB based UI.

As part of this change, a few configuration items in [webserver] section are removed and no longer applicable, including authenticate, filter_by_owner, owner_mode, and rbac.

Before upgrading to this release, we recommend activating the new FAB RBAC UI. For that, you should set the rbac options in [webserver] in the airflow.cfg file to True

  1. [webserver]
  2. rbac = True

In order to login to the interface, you need to create an administrator account.

Assuming you have already installed Airflow 1.10.15, you can create a user with Airflow 2.0 CLI command syntax airflow users create. You don’t need to make changes to the configuration file as the FAB RBAC UI is the only supported UI.

  1. airflow users create \
  2. --role Admin \
  3. --username admin \
  4. --firstname FIRST_NAME \
  5. --lastname LAST_NAME \
  6. --email EMAIL@example.org

Breaking Change in OAuth

Note

When multiple replicas of the airflow webserver are running they need to share the same secret_key to access the same user session. Inject this via any configuration mechanism. The 1.10.15 bridge-release modifies this feature to use randomly generated secret keys instead of an insecure default and may break existing deployments that rely on the default. The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token generated using the secret key has a short expiry time though - make sure that time on ALL the machines that you run airflow components on is synchronized (for example using ntpd) otherwise you might get “forbidden” errors when the logs are accessed.

The flask-oauthlib has been replaced with authlib because flask-oauthlib has been deprecated in favor of authlib. The Old and New provider configuration keys that have changed are as follows

Old Keys

New keys

consumer_key

client_id

consumer_secret

client_secret

base_url

api_base_url

request_token_params

client_kwargs

For more information, visit https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth

Breaking Change in Pendulum Support

Airflow has upgraded from Pendulum 1.x to Pendulum 2.x. This comes with a few breaking changes as certain methods and their definitions in Pendulum 2.x have changed or have been removed.

For instance the following snippet will now throw errors:

  1. execution_date.format("YYYY-MM-DD HH:mm:ss", formatter="alternative")

as the formatter option is not supported in Pendulum 2.x and alternative is used by default.

For more information, visit https://pendulum.eustace.io/blog/pendulum-2.0.0-is-out.html

Step 6: Upgrade Configuration settings

Airflow 2.0 is stricter with respect to expectations on configuration data and requires explicit specifications of configuration values in more cases rather than defaulting to a generic value.

Some of these are detailed in the Upgrade Check guide, but a significant area of change is with respect to the Kubernetes Executor. This is called out below for users of the Kubernetes Executor.

Upgrade KubernetesExecutor settings

The KubernetesExecutor Will No Longer Read from the airflow.cfg for Base Pod Configurations.

In Airflow 2.0, the KubernetesExecutor will require a base pod template written in yaml. This file can exist anywhere on the host machine and will be linked using the pod_template_file configuration in the airflow.cfg file. You can create a pod_template_file by running the following command: airflow generate_pod_template

The airflow.cfg will still accept values for the worker_container_repository, the worker_container_tag, and the default namespace.

The following airflow.cfg values will be deprecated:

  1. worker_container_image_pull_policy
  2. airflow_configmap
  3. airflow_local_settings_configmap
  4. dags_in_image
  5. dags_volume_subpath
  6. dags_volume_mount_point
  7. dags_volume_claim
  8. logs_volume_subpath
  9. logs_volume_claim
  10. dags_volume_host
  11. logs_volume_host
  12. env_from_configmap_ref
  13. env_from_secret_ref
  14. git_repo
  15. git_branch
  16. git_sync_depth
  17. git_subpath
  18. git_sync_rev
  19. git_user
  20. git_password
  21. git_sync_root
  22. git_sync_dest
  23. git_dags_folder_mount_point
  24. git_ssh_key_secret_name
  25. git_ssh_known_hosts_configmap_name
  26. git_sync_credentials_secret
  27. git_sync_container_repository
  28. git_sync_container_tag
  29. git_sync_init_container_name
  30. git_sync_run_as_user
  31. worker_service_account_name
  32. image_pull_secrets
  33. gcp_service_account_keys
  34. affinity
  35. tolerations
  36. run_as_user
  37. fs_group
  38. [kubernetes_node_selectors]
  39. [kubernetes_annotations]
  40. [kubernetes_environment_variables]
  41. [kubernetes_secrets]
  42. [kubernetes_labels]

The ``executor_config`` Will Now Expect a ``kubernetes.client.models.V1Pod`` Class When Launching Tasks

In Airflow 1.10.x, users could modify task pods at runtime by passing a dictionary to the executor_config variable. Users will now have full access the Kubernetes API via the kubernetes.client.models.V1Pod.

While in the deprecated version a user would mount a volume using the following dictionary:

  1. second_task = PythonOperator(
  2. task_id="four_task",
  3. python_callable=test_volume_mount,
  4. executor_config={
  5. "KubernetesExecutor": {
  6. "volumes": [
  7. {
  8. "name": "example-kubernetes-test-volume",
  9. "hostPath": {"path": "/tmp/"},
  10. },
  11. ],
  12. "volume_mounts": [
  13. {
  14. "mountPath": "/foo/",
  15. "name": "example-kubernetes-test-volume",
  16. },
  17. ],
  18. }
  19. },
  20. )

In the new model a user can accomplish the same thing using the following code under the pod_override key:

  1. from kubernetes.client import models as k8s
  2. @task(
  3. task_id="four_task",
  4. executor_config={
  5. "pod_override": k8s.V1Pod(
  6. spec=k8s.V1PodSpec(
  7. containers=[
  8. k8s.V1Container(
  9. name="base",
  10. volume_mounts=[
  11. k8s.V1VolumeMount(
  12. mount_path="/foo/",
  13. name="example-kubernetes-test-volume",
  14. )
  15. ],
  16. )
  17. ],
  18. volumes=[
  19. k8s.V1Volume(
  20. name="example-kubernetes-test-volume",
  21. host_path=k8s.V1HostPathVolumeSource(path="/tmp/"),
  22. )
  23. ],
  24. )
  25. )
  26. },
  27. )
  28. def test_volume_mount():
  29. pass
  30. second_task = test_volume_mount()

For Airflow 2.0, the traditional executor_config will continue operation with a deprecation warning, but will be removed in a future version.

Step 7: Upgrade to Airflow 2

After running the upgrade checks as described above, installing the backported providers, modifying the DAGs to be compatible, and updating the configuration settings, you should be ready to upgrade to Airflow 2.0.

A final run of the upgrade checks is always a good idea to make sure you have missed anything. At this stage the problems detected should be either be zero or minimal which you plan to fix after upgrading the Airflow version.

At this point, just follow the standard Airflow version upgrade process:

  • Make sure your Airflow meta database is backed up

  • Pause all the DAGs and make sure there is nothing actively running

    • The reason to pause DAGs is to make sure that nothing is actively being written to the database during the database upgrade which will follow in a later step.

    • To be extra careful, it is best to have a database backup after the DAGs have been paused.

  • Install / upgrade the Airflow version to the 2.0 version of choice

  • Make sure to install the right providers

    • This can be done by using the “extras” option as part of the Airflow installation, or by individually installing the providers.

    • Please note that you may have to uninstall the backport providers before installing the new providers, if you are installing using pip. This would not apply if you are installing using an Airflow Docker image with a set of specified requirements, where the change automatically gets a fresh set of modules.

    • You can read more about providers at Provider packages.

  • Upgrade the Airflow meta database using airflow db upgrade.

    • The above command may be unfamiliar, since it is shown using the Airflow 2.0 CLI syntax.

    • The database upgrade may modify the database schema as needed and also map the existing data to be compliant with the update database schema.

    Note

    The database upgrade may take a while depending on the number of DAGs in the database and the volume of history stored in the database for task history, xcom variables, etc. In our testing, we saw that performing the Airflow database upgrade from Airflow 1.10.15 to Airflow 2.0 took between two to three minutes on an Airflow database on PostgreSQL with around 35,000 task instances and 500 DAGs. For a faster database upgrade and for better overall performance, it is recommended that you periodically archive the old historical elements which are no longer of value.

  • Restart Airflow Scheduler, Webserver, and Workers

Appendix

Changed Parameters for the KubernetesPodOperator

Port has migrated from a list[Port] to a list[V1ContainerPort]

Before:

  1. from airflow.kubernetes.pod import Port
  2. port = Port("http", 80)
  3. k = KubernetesPodOperator(
  4. namespace="default",
  5. image="ubuntu:16.04",
  6. cmds=["bash", "-cx"],
  7. arguments=["echo 10"],
  8. ports=[port],
  9. task_id="task",
  10. )

After:

  1. from kubernetes.client import models as k8s
  2. port = k8s.V1ContainerPort(name="http", container_port=80)
  3. k = KubernetesPodOperator(
  4. namespace="default",
  5. image="ubuntu:16.04",
  6. cmds=["bash", "-cx"],
  7. arguments=["echo 10"],
  8. ports=[port],
  9. task_id="task",
  10. )

Volume_mounts have migrated from a list[VolumeMount] to a list[V1VolumeMount]

Before:

  1. from airflow.kubernetes.volume_mount import VolumeMount
  2. volume_mount = VolumeMount("test-volume", mount_path="/root/mount_file", sub_path=None, read_only=True)
  3. k = KubernetesPodOperator(
  4. namespace="default",
  5. image="ubuntu:16.04",
  6. cmds=["bash", "-cx"],
  7. arguments=["echo 10"],
  8. volume_mounts=[volume_mount],
  9. task_id="task",
  10. )

After:

  1. from kubernetes.client import models as k8s
  2. volume_mount = k8s.V1VolumeMount(
  3. name="test-volume", mount_path="/root/mount_file", sub_path=None, read_only=True
  4. )
  5. k = KubernetesPodOperator(
  6. namespace="default",
  7. image="ubuntu:16.04",
  8. cmds=["bash", "-cx"],
  9. arguments=["echo 10"],
  10. volume_mounts=[volume_mount],
  11. task_id="task",
  12. )

Volume has migrated from a list[Volume] to a list[V1Volume]

Before:

  1. from airflow.kubernetes.volume import Volume
  2. volume_config = {"persistentVolumeClaim": {"claimName": "test-volume"}}
  3. volume = Volume(name="test-volume", configs=volume_config)
  4. k = KubernetesPodOperator(
  5. namespace="default",
  6. image="ubuntu:16.04",
  7. cmds=["bash", "-cx"],
  8. arguments=["echo 10"],
  9. volumes=[volume],
  10. task_id="task",
  11. )

After:

  1. from kubernetes.client import models as k8s
  2. volume = k8s.V1Volume(
  3. name="test-volume",
  4. persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name="test-volume"),
  5. )
  6. k = KubernetesPodOperator(
  7. namespace="default",
  8. image="ubuntu:16.04",
  9. cmds=["bash", "-cx"],
  10. arguments=["echo 10"],
  11. volumes=[volume],
  12. task_id="task",
  13. )

env_vars has migrated from a dict to a list[V1EnvVar]

Before:

  1. k = KubernetesPodOperator(
  2. namespace="default",
  3. image="ubuntu:16.04",
  4. cmds=["bash", "-cx"],
  5. arguments=["echo 10"],
  6. env_vars={"ENV1": "val1", "ENV2": "val2"},
  7. task_id="task",
  8. )

After:

  1. from kubernetes.client import models as k8s
  2. env_vars = [
  3. k8s.V1EnvVar(name="ENV1", value="val1"),
  4. k8s.V1EnvVar(name="ENV2", value="val2"),
  5. ]
  6. k = KubernetesPodOperator(
  7. namespace="default",
  8. image="ubuntu:16.04",
  9. cmds=["bash", "-cx"],
  10. arguments=["echo 10"],
  11. env_vars=env_vars,
  12. task_id="task",
  13. )

PodRuntimeInfoEnv has been removed

PodRuntimeInfoEnv can now be added to the env_vars variable as a V1EnvVarSource

Before:

  1. from airflow.kubernetes.pod_runtime_info_env import PodRuntimeInfoEnv
  2. k = KubernetesPodOperator(
  3. namespace="default",
  4. image="ubuntu:16.04",
  5. cmds=["bash", "-cx"],
  6. arguments=["echo 10"],
  7. pod_runtime_info_envs=[PodRuntimeInfoEnv("ENV3", "status.podIP")],
  8. task_id="task",
  9. )

After:

  1. from kubernetes.client import models as k8s
  2. env_vars = [
  3. k8s.V1EnvVar(
  4. name="ENV3",
  5. value_from=k8s.V1EnvVarSource(field_ref=k8s.V1ObjectFieldSelector(field_path="status.podIP")),
  6. )
  7. ]
  8. k = KubernetesPodOperator(
  9. namespace="default",
  10. image="ubuntu:16.04",
  11. cmds=["bash", "-cx"],
  12. arguments=["echo 10"],
  13. env_vars=env_vars,
  14. task_id="task",
  15. )

configmaps has been removed

Configmaps can now be added to the env_from variable as a V1EnvVarSource

Before:

  1. k = KubernetesPodOperator(
  2. namespace="default",
  3. image="ubuntu:16.04",
  4. cmds=["bash", "-cx"],
  5. arguments=["echo 10"],
  6. configmaps=["test-configmap"],
  7. task_id="task",
  8. )

After:

  1. from kubernetes.client import models as k8s
  2. configmap = "test-configmap"
  3. env_from = [k8s.V1EnvFromSource(config_map_ref=k8s.V1ConfigMapEnvSource(name=configmap))]
  4. k = KubernetesPodOperator(
  5. namespace="default",
  6. image="ubuntu:16.04",
  7. cmds=["bash", "-cx"],
  8. arguments=["echo 10"],
  9. env_from=env_from,
  10. task_id="task",
  11. )

Resources has migrated from a Dict to a V1ResourceRequirements

Before:

  1. resources = {
  2. "limit_cpu": 0.25,
  3. "limit_memory": "64Mi",
  4. "limit_ephemeral_storage": "2Gi",
  5. "request_cpu": "250m",
  6. "request_memory": "64Mi",
  7. "request_ephemeral_storage": "1Gi",
  8. }
  9. k = KubernetesPodOperator(
  10. namespace="default",
  11. image="ubuntu:16.04",
  12. cmds=["bash", "-cx"],
  13. arguments=["echo 10"],
  14. labels={"foo": "bar"},
  15. name="test",
  16. task_id="task" + self.get_current_task_name(),
  17. in_cluster=False,
  18. do_xcom_push=False,
  19. resources=resources,
  20. )

After:

  1. from kubernetes.client import models as k8s
  2. resources = k8s.V1ResourceRequirements(
  3. requests={"memory": "64Mi", "cpu": "250m", "ephemeral-storage": "1Gi"},
  4. limits={
  5. "memory": "64Mi",
  6. "cpu": 0.25,
  7. "nvidia.com/gpu": None,
  8. "ephemeral-storage": "2Gi",
  9. },
  10. )
  11. k = KubernetesPodOperator(
  12. namespace="default",
  13. image="ubuntu:16.04",
  14. cmds=["bash", "-cx"],
  15. arguments=["echo 10"],
  16. labels={"foo": "bar"},
  17. name="test-" + str(random.randint(0, 1000000)),
  18. task_id="task" + self.get_current_task_name(),
  19. in_cluster=False,
  20. do_xcom_push=False,
  21. resources=resources,
  22. )

image_pull_secrets has migrated from a String to a list[k8s.V1LocalObjectReference]

Before:

  1. k = KubernetesPodOperator(
  2. namespace="default",
  3. image="ubuntu:16.04",
  4. cmds=["bash", "-cx"],
  5. arguments=["echo 10"],
  6. name="test",
  7. task_id="task",
  8. image_pull_secrets="fake-secret",
  9. cluster_context="default",
  10. )

After:

  1. quay_k8s = KubernetesPodOperator(
  2. namespace="default",
  3. image="quay.io/apache/bash",
  4. image_pull_secrets=[k8s.V1LocalObjectReference("testquay")],
  5. cmds=["bash", "-cx"],
  6. name="airflow-private-image-pod",
  7. task_id="task-two",
  8. )

Migration Guide from Experimental API to Stable API v1

In Airflow 2.0, we added the new REST API. Experimental API still works, but support may be dropped in the future.

The experimental API, however, does not require authentication, so it is disabled by default. You need to explicitly enable the experimental API if you want to use it. If your application is still using the experimental API, you should seriously consider migrating to the stable API.

The stable API exposes many endpoints available through the webserver. Here are the differences between the two endpoints that will help you migrate from the experimental REST API to the stable REST API.

Base Endpoint

The base endpoint for the stable API v1 is /api/v1/. You must change the experimental base endpoint from /api/experimental/ to /api/v1/. The table below shows the differences:

Purpose

Experimental REST API Endpoint

Stable REST API Endpoint

Create a DAGRuns(POST)

/api/experimental/dags/<DAG_ID>/dag_runs

/api/v1/dags/{dag_id}/dagRuns

List DAGRuns(GET)

/api/experimental/dags/<DAG_ID>/dag_runs

/api/v1/dags/{dag_id}/dagRuns

Check Health status(GET)

/api/experimental/test

/api/v1/health

Task information(GET)

/api/experimental/dags/<DAG_ID>/tasks/<TASK_ID>

/api/v1//dags/{dag_id}/tasks/{task_id}

TaskInstance public variable(GET)

/api/experimental/dags/<DAG_ID>/dag_runs/<string:execution_date>/tasks/<TASK_ID>

/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}

Pause DAG(PATCH)

/api/experimental/dags/<DAG_ID>/paused/<string:paused>

/api/v1/dags/{dag_id}

Information of paused DAG(GET)

/api/experimental/dags/<DAG_ID>/paused

/api/v1/dags/{dag_id}

Latest DAG Runs(GET)

/api/experimental/latest_runs

/api/v1/dags/{dag_id}/dagRuns

Get all pools(GET)

/api/experimental/pools

/api/v1/pools

Create a pool(POST)

/api/experimental/pools

/api/v1/pools

Delete a pool(DELETE)

/api/experimental/pools/<string:name>

/api/v1/pools/{pool_name}

DAG Lineage(GET)

/api/experimental/lineage/<DAG_ID>/<string:execution_date>/

/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/xcomEntries

This endpoint /api/v1/dags/{dag_id}/dagRuns also allows you to filter dag_runs with parameters such as start_date, end_date, execution_date etc in the query string. Therefore the operation previously performed by this endpoint:

  1. /api/experimental/dags/<string:dag_id>/dag_runs/<string:execution_date>

can now be handled with filter parameters in the query string. Getting information about latest runs can be accomplished with the help of filters in the query string of this endpoint(/api/v1/dags/{dag_id}/dagRuns). Please check the Stable API reference documentation for more information

Changes to Exception handling for from DAG callbacks

Exception from DAG callbacks used to crash the Airflow Scheduler. As part of our efforts to make the Scheduler more performant and reliable, we have changed this behavior to log the exception instead. On top of that, a new dag.callback_exceptions counter metric has been added to help better monitor callback exceptions.

Migrating to TaskFlow API

Airflow 2.0 introduced the TaskFlow API to simplify the declaration of Python callable tasks. Users are encouraged to replace classic operators with their TaskFlow decorator alternatives. For details, see Working with TaskFlow.

Classic Operator

TaskFlow Decorator

PythonOperator

@task (short for @task.python)

PythonVirtualenvOperator

@task.virtualenv

BranchPythonOperator

@task.branch

DockerOperator

@task.docker

KubernetesPodOperator

@task.kubernetes

Airflow CLI changes in 2.0

The Airflow CLI has been organized so that related commands are grouped together as subcommands, which means that if you use these commands in your scripts, you have to make changes to them.

This section describes the changes that have been made, and what you need to do to update your scripts. The ability to manipulate users from the command line has been changed. airflow create_user, airflow delete_user and airflow list_users has been grouped to a single command airflow users with optional flags create, list and delete. The airflow list_dags command is now airflow dags list, airflow pause is airflow dags pause, etc.

In Airflow 1.10 and 2.0 there is an airflow config command but there is a difference in behavior. In Airflow 1.10, it prints all config options while in Airflow 2.0, it’s a command group. airflow config is now airflow config list. You can check other options by running the command airflow config --help

For a complete list of updated CLI commands, see https://airflow.apache.org/cli.html.

You can learn about the commands by running airflow --help. For example to get help about the celery group command, you have to run the help command: airflow celery --help.

Old command

New command

Group

airflow worker

airflow celery worker

celery

airflow flower

airflow celery flower

celery

airflow trigger_dag

airflow dags trigger

dags

airflow delete_dag

airflow dags delete

dags

airflow show_dag

airflow dags show

dags

airflow list_dags

airflow dags list

dags

airflow dag_status

airflow dags status

dags

airflow backfill

airflow dags backfill

dags

airflow list_dag_runs

airflow dags list-runs

dags

airflow pause

airflow dags pause

dags

airflow unpause

airflow dags unpause

dags

airflow next_execution

airflow dags next-execution

dags

airflow test

airflow tasks test

tasks

airflow clear

airflow tasks clear

tasks

airflow list_tasks

airflow tasks list

tasks

airflow task_failed_deps

airflow tasks failed-deps

tasks

airflow task_state

airflow tasks state

tasks

airflow run

airflow tasks run

tasks

airflow render

airflow tasks render

tasks

airflow initdb

airflow db init

db

airflow resetdb

airflow db reset

db

airflow upgradedb

airflow db upgrade

db

airflow checkdb

airflow db check

db

airflow shell

airflow db shell

db

airflow pool

airflow pools

pools

airflow create_user

airflow users create

users

airflow delete_user

airflow users delete

users

airflow list_users

airflow users list

users

airflow rotate_fernet_key

airflow rotate-fernet-key

airflow sync_perm

airflow sync-perm

Example Usage for the ``users`` group

To create a new user:

  1. airflow users create --username jondoe --lastname doe --firstname jon --email jdoe@apache.org --role Viewer --password test

To list users:

  1. airflow users list

To delete a user:

  1. airflow users delete --username jondoe

To add a user to a role:

  1. airflow users add-role --username jondoe --role Public

To remove a user from a role:

  1. airflow users remove-role --username jondoe --role Public

Use exactly single character for short option style change in CLI

For Airflow short option, use exactly one single character. New commands are available according to the following table:

Old command

New command

airflow (dags|tasks|scheduler) [-sd, —subdir]

airflow (dags|tasks|scheduler) [-S, —subdir]

airflow test [-dr, —dry_run]

airflow tasks test [-n, —dry-run]

airflow test [-tp, —task_params]

airflow tasks test [-t, —task-params]

airflow test [-pm, —post_mortem]

airflow tasks test [-m, —post-mortem]

airflow run [-int, —interactive]

airflow tasks run [-N, —interactive]

airflow backfill [-dr, —dry_run]

airflow dags backfill [-n, —dry-run]

airflow clear [-dx, —dag_regex]

airflow tasks clear [-R, —dag-regex]

airflow kerberos [-kt, —keytab]

airflow kerberos [-k, —keytab]

airflow webserver [-hn, —hostname]

airflow webserver [-H, —hostname]

airflow worker [-cn, —celery_hostname]

airflow celery worker [-H, —celery-hostname]

airflow flower [-hn, —hostname]

airflow celery flower [-H, —hostname]

airflow flower [-fc, —flower_conf]

airflow celery flower [-c, —flower-conf]

airflow flower [-ba, —basic_auth]

airflow celery flower [-A, —basic-auth]

For Airflow long option, use [kebab-case](https://en.wikipedia.org/wiki/Letter_case) instead of [snake_case](https://en.wikipedia.org/wiki/Snake_case)

Old option

New option

—task_regex

—task-regex

—start_date

—start-date

—end_date

—end-date

—dry_run

—dry-run

—no_backfill

—no-backfill

—mark_success

—mark-success

—donot_pickle

—donot-pickle

—ignore_dependencies

—ignore-dependencies

—ignore_first_depends_on_past

—ignore-first-depends-on-past

—delay_on_limit

—delay-on-limit

—reset_dagruns

—reset-dagruns

—rerun_failed_tasks

—rerun-failed-tasks

—run_backwards

—run-backwards

—only_failed

—only-failed

—only_running

—only-running

—exclude_subdags

—exclude-subdags

—exclude_parentdag

—exclude-parentdag

—dag_regex

—dag-regex

—run_id

—run-id

—exec_date

—exec-date

—ignore_all_dependencies

—ignore-all-dependencies

—ignore_depends_on_past

—ignore-depends-on-past

—ship_dag

—ship-dag

—job_id

—job-id

—cfg_path

—cfg-path

—ssl_cert

—ssl-cert

—ssl_key

—ssl-key

—worker_timeout

—worker-timeout

—access_logfile

—access-logfile

—error_logfile

—error-logfile

—dag_id

—dag-id

—num_runs

—num-runs

—do_pickle

—do-pickle

—celery_hostname

—celery-hostname

—broker_api

—broker-api

—flower_conf

—flower-conf

—url_prefix

—url-prefix

—basic_auth

—basic-auth

—task_params

—task-params

—post_mortem

—post-mortem

—conn_uri

—conn-uri

—conn_type

—conn-type

—conn_host

—conn-host

—conn_login

—conn-login

—conn_password

—conn-password

—conn_schema

—conn-schema

—conn_port

—conn-port

—conn_extra

—conn-extra

—use_random_password

—use-random-password

—skip_serve_logs

—skip-serve-logs

Remove serve_logs command from CLI

The serve_logs command has been deleted. This command should be run only by internal application mechanisms and there is no need for it to be accessible from the CLI interface.

dag_state CLI command

If the DAGRun was triggered with conf key/values passed in, they will also be printed in the dag_state CLI response ie. running, {“name”: “bob”} whereas in prior releases it just printed the state: ie. running

Deprecating ignore_first_depends_on_past on backfill command and default it to True

When doing backfill with depends_on_past dags, users will need to pass --ignore-first-depends-on-past. We should default it as true to avoid confusion

Changes to Airflow Plugins

If you are using Airflow Plugins and were passing admin_views & menu_links which were used in the non-RBAC UI (flask-admin based UI), update it to use flask_appbuilder_views and flask_appbuilder_menu_links.

Old:

  1. from airflow.plugins_manager import AirflowPlugin
  2. from flask_admin import BaseView, expose
  3. from flask_admin.base import MenuLink
  4. class TestView(BaseView):
  5. @expose("/")
  6. def test(self):
  7. # in this example, put your test_plugin/test.html template at airflow/plugins/templates/test_plugin/test.html
  8. return self.render("test_plugin/test.html", content="Hello galaxy!")
  9. v = TestView(category="Test Plugin", name="Test View")
  10. ml = MenuLink(category="Test Plugin", name="Test Menu Link", url="https://airflow.apache.org/")
  11. class AirflowTestPlugin(AirflowPlugin):
  12. admin_views = [v]
  13. menu_links = [ml]

Change it to:

  1. from airflow.plugins_manager import AirflowPlugin
  2. from flask_appbuilder import expose, BaseView as AppBuilderBaseView
  3. class TestAppBuilderBaseView(AppBuilderBaseView):
  4. default_view = "test"
  5. @expose("/")
  6. def test(self):
  7. return self.render_template("test_plugin/test.html", content="Hello galaxy!")
  8. v_appbuilder_view = TestAppBuilderBaseView()
  9. v_appbuilder_package = {
  10. "name": "Test View",
  11. "category": "Test Plugin",
  12. "view": v_appbuilder_view,
  13. }
  14. # Creating a flask appbuilder Menu Item
  15. appbuilder_mitem = {
  16. "name": "Google",
  17. "category": "Search",
  18. "category_icon": "fa-th",
  19. "href": "https://www.google.com",
  20. }
  21. # Defining the plugin class
  22. class AirflowTestPlugin(AirflowPlugin):
  23. name = "test_plugin"
  24. appbuilder_views = [v_appbuilder_package]
  25. appbuilder_menu_items = [appbuilder_mitem]

Changes to extras names

The all extra were reduced to include only user-facing dependencies. This means that this extra does not contain development dependencies. If you were using it and depending on the development packages then you should use devel_all.

Support for Airflow 1.10.x releases

Airflow 1.10.x reached end of life on 17 June 2021. No new Airflow 1.x versions will be released.

Support of Backport providers ended on 17 March 2021. No new versions of backport providers will be released.

We plan to take a strict Semantic Versioning approach to our versioning and release process. This means that we do not plan to make any backwards-incompatible changes in the 2.* releases. Any breaking changes, including the removal of features deprecated in Airflow 2.0 will happen as part of the Airflow 3.0 release.