Skip to content

drwoj/tweets-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Posts Analysis with PySpark

This project uses Apache Spark (using PySpark) to analyze Twitter posts (Covid, Grammys and financial tweets). The application is Dockerized and can be run using Docker Compose.

Prerequisites

  • Docker

Setup

Clone the repository to your local machine.

git clone [email protected]:drwoj/tweets-pyspark.git

Running the Application

  1. Build the Docker images:
docker-compose build
  1. Run the Docker containers:
docker-compose up -d
  1. Submit the Spark application:
docker-compose exec spark-master spark-submit --master spark:https://spark-master:7077 src/main.py

Stopping the Application

To stop the application and remove the containers defined in the docker-compose.yml file, run:

docker-compose down

Accessing the Application

You will be able to access it through a Spark WEB UI. The port (9090) specified in docker-compose.yml will be exposed on your host machine, so you can access S[park Master by navigating to localhost:9090 in your web browser.

Spark Web UI view

img

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages