2 Mayıs 2025 Cuma

Apache Airflow - Orchestrator

Giriş
Açıklaması şöyle. Diğer alternatifler Dagster ve Mage
Airflow was built by Airbnb in 2014 to solve "Cron on Steroids."

It treats the world as a list of verbs:

- extract_data()
- transform_data()
- load_data()
Gerekli Bileşenler
Açıklaması şöyle
Airflow is heavy.

To run a simple "Hello World" pipeline, you need:

1. A Webserver (Flask).
2. A Scheduler (The Heartbeat).
3. A Metastore (Postgres).
4. A Queue (Redis).
5. A Worker (Celery/K8s).

Task Based Trap
Açıklaması şöyle
It doesn't know what extract_data produces. It just knows it finished with Exit Code 0.

-  To be fair, Airflow has evolved. Sensors, SLAs, Datasets, and deferrable operators exist.
- But they are add-ons to a fundamentally task-first model — not first-class data abstractions.
- The scheduler still reasons about task completion, not data correctness.
-  This creates the "Silent Failure" problem.

The Scenario:

- Task A (Extract): Runs successfully but pulls 0 rows because the API changed.
- Task B (Transform): Runs successfully on 0 rows.
- Task C (Load): Overwrites your production table with… nothing.
- Airflow: "All Green! Good job!"
You don't find out until the CEO calls you.