Apache Airflow Data Pipeline

Overview

This repository contains the code for building a robust data pipeline using Apache Airflow. The pipeline focuses on extracting, transforming, loading (ETL), and analyzing data related to UK earnings and hours.

Getting Started

Follow these steps to set up the Airflow environment and run the data pipeline:

Prerequisites

Docker installed on your machine.

Setup

Clone this repository:

git clone https://github.com/kishorechk/airflow-tutorial.git

Navigate to the project directory:
```
cd airflow-data-pipeline
```
Build and start the Docker containers:
```
docker-compose up --build
```
Access the Airflow web UI at http:https://localhost:8080 (default credentials: admin/admin).

DAG Structure

The Directed Acyclic Graph (DAG) for the data pipeline is organized into four main stages:

1. Extract

Cleanup Data Directory: BashOperator to clean up the 'data' directory.
Download File: BashOperator to download the data file from the provided URL.
Unzip File: BashOperator to unzip the downloaded file.

2. Transform

Read XLS All Employees: PythonOperator to read the Excel file containing data for all employees.

3. Load

Save Employees: PythonOperator to save the transformed data into the PostgreSQL database.

4. Analyze

Generate Salary Bar Chart: PythonOperator to generate a bar chart for the top 20 job types, depicting the number of jobs and mean salary.
Generate Annual Percentage Change Line Chart: PythonOperator to generate a line chart for the annual percentage change and mean salary of the top 20 job types.

Additional Libraries

Pandas: Used for data manipulation and cleaning.
Matplotlib, Seaborn: Used for data visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dags		dags
script		script
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.airflow		Dockerfile.airflow
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Airflow Data Pipeline

Overview

Getting Started

Prerequisites

Setup

DAG Structure

1. Extract

2. Transform

3. Load

4. Analyze

Additional Libraries

About

Releases

Packages

Languages

kishorechk/airflow-tutorial

Folders and files

Latest commit

History

Repository files navigation

Apache Airflow Data Pipeline

Overview

Getting Started

Prerequisites

Setup

DAG Structure

1. Extract

2. Transform

3. Load

4. Analyze

Additional Libraries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages