Skip to content

pdm-tech/apache-airflow-installation-guide-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Installing Apache Airflow on Linux

Installing dependencies

Open a terminal and enter the following commands:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y python3-pip python3-dev
sudo apt-get install -y libmysqlclient-dev
sudo apt-get install -y libssl-dev
sudo apt-get install -y libffi-dev
sudo apt-get install -y libblas-dev liblapack-dev libatlas-base-dev

Virtual environment

Let's create a virtual environment using Python 3. A virtual environment creates an isolated environment for installing and running applications, which avoids conflicts between dependencies of different projects.

sudo pip3 install virtualenv
virtualenv venv
source venv/bin/activate

Now it looks like this:

(venv) username@ubuntu:~$ 

Installing Apache Airflow

pip install apache-airflow

Validate Airflow installation by typing this into the terminal

airflow version

Your version should be printed. I have this: 2.8.0

Initializing the Airflow Database

airflow db init

Creating a user (Fill in the fields yourself)

airflow users create \
    --username admin \
    --firstname YourFirstName \
    --lastname YourLastName \
    --role Admin \
    --email [email protected]

Enter your password in the terminal window (only required once when creating the user for the first time)

Apache Airflow on Lunux Ubuntu - Start

To run Apache Airflow, do: 1. and 2. in seperate terminal windows

  1. To start the Airflow web server, paste this into the first window and hit Enter:
source venv/bin/activate
airflow webserver -p 8080
  1. To start Airflow scheduler, paste this into the seconed window and hit Enter:
source venv/bin/activate
airflow scheduler

Now you can go to Airflow web frontend - Open in the browser (DO THIS):

localhost:8080

To stop airflow webserver:

find the process id: (assuming 8080 is the port)

lsof -i tcp:8080

kill it

kill <pid>

Or Ctrl + c in the window to interrupt. (DO THIS)

Adding your own DAGs

Create a directory with dags (continue to work in a virtual environment)

mkdir ~/airflow/dags

Create a new file named “test_dag.py” and paste your code written in Python into it:

nano ~/airflow/dags/test_dag.py

Examle DAG:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator

# Setting DAG parameters
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2023, 1, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

# Creating DAG
dag = DAG(
    'test_dag',
    default_args=default_args,
    description='A simple test DAG',
    schedule_interval=timedelta(days=1),
)

# Defining operators
start_task = DummyOperator(task_id='start_task', dag=dag)
end_task = DummyOperator(task_id='end_task', dag=dag)

# Defining the execution order of operators
start_task >> end_task

Save the file and close the editor (Ctrl+s => Ctrl+x).

Now you can run the added DAG

Testing DAG execution:

airflow test test_dag start_task <date_of_completion>

Starting DAG execution:

airflow trigger_dag test_dag

Starting a specific task in a DAG:

airflow run test_dag start_task date_of_completion>

Or do it manually via the web interface. test_dag.py After launch, if no errors occur, you will see success in the field: Status of all previous DAG runs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages