Operators

An Operator is conceptually a template for a predefined Task, that you can just define declaratively inside your DAG:

  1. with DAG("my-dag") as dag:
  2. ping = SimpleHttpOperator(endpoint="http://example.com/update/")
  3. email = EmailOperator(to="admin@example.com", subject="Update complete")
  4. ping >> email

Airflow has a very extensive set of operators available, with some built-in to the core or pre-installed providers. Some popular operators from core include:

  • BashOperator - executes a bash command

  • PythonOperator - calls an arbitrary Python function

  • EmailOperator - sends an email

  • Use the @task decorator to execute an arbitrary Python function. It doesn’t support rendering jinja templates passed as arguments.

Note

The @task decorator is recommended over the classic PythonOperator to execute Python callables with no template rendering in its arguments.

For a list of all core operators, see: Core Operators and Hooks Reference.

If the operator you need isn’t installed with Airflow by default, you can probably find it as part of our huge set of community provider packages. Some popular operators from here include:

But there are many, many more - you can see the full list of all community-managed operators, hooks, sensors and transfers in our providers packages documentation.

Note

Inside Airflow’s code, we often mix the concepts of Tasks and Operators, and they are mostly interchangeable. However, when we talk about a Task, we mean the generic “unit of execution” of a DAG; when we talk about an Operator, we mean a reusable, pre-made Task template whose logic is all done for you and that just needs some arguments.

Jinja Templating

Airflow leverages the power of Jinja Templating and this can be a powerful tool to use in combination with macros.

For example, say you want to pass the start of the data interval as an environment variable to a Bash script using the BashOperator:

  1. # The start of the data interval as YYYY-MM-DD
  2. date = "{{ ds }}"
  3. t = BashOperator(
  4. task_id="test_env",
  5. bash_command="/tmp/test.sh ",
  6. dag=dag,
  7. env={"DATA_INTERVAL_START": date},
  8. )

Here, {{ ds }} is a templated variable, and because the env parameter of the BashOperator is templated with Jinja, the data interval’s start date will be available as an environment variable named DATA_INTERVAL_START in your Bash script.

You can use Jinja templating with every parameter that is marked as “templated” in the documentation. Template substitution occurs just before the pre_execute function of your operator is called.

You can also use Jinja templating with nested fields, as long as these nested fields are marked as templated in the structure they belong to: fields registered in template_fields property will be submitted to template substitution, like the path field in the example below:

  1. class MyDataReader:
  2. template_fields: Sequence[str] = ("path",)
  3. def __init__(self, my_path):
  4. self.path = my_path
  5. # [additional code here...]
  6. t = PythonOperator(
  7. task_id="transform_data",
  8. python_callable=transform_data,
  9. op_args=[MyDataReader("/tmp/{{ ds }}/my_file")],
  10. dag=dag,
  11. )

Note

The template_fields property is a class variable and guaranteed to be of a Sequence[str] type (i.e. a list or tuple of strings).

Deep nested fields can also be substituted, as long as all intermediate fields are marked as template fields:

  1. class MyDataTransformer:
  2. template_fields: Sequence[str] = ("reader",)
  3. def __init__(self, my_reader):
  4. self.reader = my_reader
  5. # [additional code here...]
  6. class MyDataReader:
  7. template_fields: Sequence[str] = ("path",)
  8. def __init__(self, my_path):
  9. self.path = my_path
  10. # [additional code here...]
  11. t = PythonOperator(
  12. task_id="transform_data",
  13. python_callable=transform_data,
  14. op_args=[MyDataTransformer(MyDataReader("/tmp/{{ ds }}/my_file"))],
  15. dag=dag,
  16. )

You can pass custom options to the Jinja Environment when creating your DAG. One common usage is to avoid Jinja from dropping a trailing newline from a template string:

  1. my_dag = DAG(
  2. dag_id="my-dag",
  3. jinja_environment_kwargs={
  4. "keep_trailing_newline": True,
  5. # some other jinja2 Environment options here
  6. },
  7. )

See the Jinja documentation to find all available options.

Rendering Fields as Native Python Objects

By default, all the template_fields are rendered as strings.

Example, let’s say extract task pushes a dictionary (Example: {"1001": 301.27, "1002": 433.21, "1003": 502.22}) to XCom table. Now, when the following task is run, order_data argument is passed a string, example: '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'.

  1. transform = PythonOperator(
  2. task_id="transform",
  3. op_kwargs={"order_data": "{{ti.xcom_pull('extract')}}"},
  4. python_callable=transform,
  5. )

If you instead want the rendered template field to return a Native Python object (dict in our example), you can pass render_template_as_native_obj=True to the DAG as follows:

  1. dag = DAG(
  2. dag_id="example_template_as_python_object",
  3. schedule=None,
  4. start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
  5. catchup=False,
  6. render_template_as_native_obj=True,
  7. )
  8. @task(task_id="extract")
  9. def extract():
  10. data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'
  11. return json.loads(data_string)
  12. @task(task_id="transform")
  13. def transform(order_data):
  14. print(type(order_data))
  15. for value in order_data.values():
  16. total_order_value += value
  17. return {"total_order_value": total_order_value}
  18. extract_task = extract()
  19. transform_task = PythonOperator(
  20. task_id="transform",
  21. op_kwargs={"order_data": "{{ti.xcom_pull('extract')}}"},
  22. python_callable=transform,
  23. )
  24. extract_task >> transform_task

In this case, order_data argument is passed: {"1001": 301.27, "1002": 433.21, "1003": 502.22}.

Airflow uses Jinja’s NativeEnvironment when render_template_as_native_obj is set to True. With NativeEnvironment, rendering a template produces a native Python type.

Reserved params keyword

In Apache Airflow 2.2.0 params variable is used during DAG serialization. Please do not use that name in third party operators. If you upgrade your environment and get the following error:

  1. AttributeError: 'str' object has no attribute '__module__'

change name from params in your operators.