This is easy to understand once we understand the data interval logic. If I want to set a start date, which makes the DAG run exact once as soon as the DAG is submitted, regardless of the schedule, what time should it be? This will also happen if you recreate your Airflow cluster, or happened to delete all your run history, thus, can be annoying.Īnd, if you set it too late, you might see the error “the task start time is later than execution time” so that the DAG is not started. However, if you set the start_date too early, and have your backfill flag enabled, then the DAG run will catch up and initiate multiple runs from that datetime. The simplest is to set a fixed date, e.g. In other words, a DAG run will only be scheduled one interval after start_date.īasically, if the start date is earlier than the data_interval_start, the DAG will be scheduled. Similarly, since the start_date argument for the DAG and its tasks points to the same logical date, it marks the start of the DAG’s rst data interval, not when tasks in the DAG will start running. What about start_date ? According to documentation: And according to Airflow’s official document:Ī DAG run is usually scheduled after its associated data interval has ended, to ensure the run is able to collect all the data within the time period.įor example, a DAG with schedule 30 22 * * *, you can clearly see in the screenshot: – data_interval_end: the end of the data intervalĭAG run time: this is the actual DAG execution time. – logical_date / data_interval_start: this was used to be called execution date until Airflow 2.2, but as the value indicates thestart of the data interval, not the actual execution time, the variable name is updated. There are two values associates to the concept: For example, if you define the schedule with 30 22***(22:30daily),thenyourdataintervalwillbefrom22:30 to the same time next date. data interval: Data interval describes the time range between two DAG runs.a DAG run will only be scheduled one interval after start_date: start date defines when your DAG will run for the rst time.There are several concepts are important if you wish to understand when your DAG will be scheduled. Why? In this article, we will mainly discuss the case when you set your schedule with a CRON job (and some cases also applies to a preset) Data interval, logical date, run time But sometimes we found it not align with our expectation. Once we have our schedule ready, we have an expectation on when our Airflow DAG will run. This is also a new feature introduced since Airflow However, this is not going to be part of this article □ģ. This provides more advanced scheduling possibilities, for example, if you want to “skip” some certain runs. “Timetables” is a new concept introduced in Airflow 2.2. But what most commonly for more customized schedule is CRON job. A preset provides easier, human readable schedule like. Schedule interval by default is set to None and this is also the case when you use manual or external trigger to the DAG.
0 Comments
Leave a Reply. |