This repository provides resources for practicing data engineering concepts using PostgreSQL. It includes Docker commands to set up and interact with a PostgreSQL database, SQL scripts to create tables, insert sample data, and perform various SQL operations, as well as instructions for getting started.
To set up PostgreSQL using Docker, follow these steps:
-
Pull the PostgreSQL Docker image:
docker pull postgres
-
Run a PostgreSQL container:
docker run --name data-engineering-postgres -e POSTGRES_PASSWORD=secret -d postgres
-
Create a new database:
docker exec -u postgres data-engineering-postgres createdb postgres-db
-
Connect to the PostgreSQL container:
docker exec -it data-engineering-postgres psql -U postgres -d postgres-db
-
Use
\dt
to view the tables in the database.
CREATE DATABASE db;
\c db
## Database Setup
### Step 1: Create Users Table
```sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
date_of_birth DATE
);
INSERT INTO users (first_name, last_name, email, date_of_birth) VALUES
('John', 'Doe', '[email protected]', '1990-01-01'),
('Jane', 'Smith', '[email protected]', '1992-05-15'),
...
('Grace', 'Wright', '[email protected]', '1997-05-10'),
('William', 'Scott', '[email protected]', '1986-07-22');
CREATE TABLE films (
film_id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
release_date DATE,
price DECIMAL(5,2),
rating VARCHAR(10),
user_rating DECIMAL(2,1) CHECK (user_rating >= 1 AND user_rating <= 5)
);
INSERT INTO films (title, release_date, price, rating, user_rating) VALUES
('Inception', '2010-07-16', 12.99, 'PG-13', 4.8),
('The Shawshank Redemption', '1994-09-23', 9.99, 'R', 4.9),
...
('La La Land', '2016-12-09', 11.99, 'PG-13', 4.5);
CREATE TABLE film_category (
category_id SERIAL PRIMARY KEY,
film_id INTEGER REFERENCES films(film_id),
category_name VARCHAR(50) NOT NULL
);
INSERT INTO film_category (film_id, category_name) VALUES
(1, 'Sci-Fi'),
(1, 'Thriller'),
...
(20, 'Music');
CREATE TABLE actors (
actor_id SERIAL PRIMARY KEY,
actor_name VARCHAR(255) NOT NULL
);
INSERT INTO actors (actor_name) VALUES
('Leonardo DiCaprio'),
('Tim Robbins'),
...
('Emma Stone');
CREATE TABLE film_actors (
film_id INTEGER REFERENCES films(film_id),
actor_id INTEGER REFERENCES actors(actor_id),
PRIMARY KEY (film_id, actor_id)
);
INSERT INTO film_actors (film_id, actor_id) VALUES
(1, 1),
(2, 2),
...
(20, 20);
## SQL Operations
This section includes various SQL operations such as creating tables, inserting data, performing joins, and using subqueries.
### Inner Join Example
```sql
SELECT f.title, a.actor_name
FROM films f
INNER JOIN film_actors fa ON f.film_id = fa.film_id
INNER JOIN actors a ON fa.actor_id = a.actor_id;
SELECT f.title, a.actor_name
FROM films f
LEFT JOIN film_actors fa ON f.film_id = fa.film_id
LEFT JOIN actors a ON fa.actor_id = a.actor_id;
SELECT title AS name
FROM films
UNION
SELECT actor_name AS name
FROM actors
ORDER BY name;
SELECT title,
(SELECT actor_name
FROM actors a
JOIN film_actors fa ON a.actor_id = fa.actor_id
WHERE fa.film_id = f.film_id
LIMIT 1) AS actor_name
FROM films f;
This repository serves as a practical guide for learning and practicing data engineering concepts using PostgreSQL. You can use Docker to set up a PostgreSQL database and perform various SQL operations to manipulate and query data. Additionally, you can explore different SQL techniques such as joins, subqueries, and unions to analyze and transform data effectively. Happy learning!