This project focuses on implementing a Big Data platform to process parliamentary data from the website of the Moroccan Parliament. The primary goal is to calculate Key Performance Indicators (KPIs) to evaluate the engagement level of each government.
- Data Extraction: Utilizes Python and BeautifulSoup for web scraping to extract parliamentary data.
- Big Data Tools: Leverages Cloudera distribution, Hadoop, HDFS, MapReduce, Hive, and HBase for scalable and distributed data processing.
- Visualization: Utilizes PowerBI for creating visualizations and dashboards to interpret and communicate the results effectively.
- Cloudera: Big Data platform for data management and processing.
- Python: Programming language for scripting and data manipulation.
- BeautifulSoup: Python library for web scraping and data extraction.
- Hadoop: Distributed storage and processing framework.
- HDFS: Hadoop Distributed File System for reliable and scalable storage.
- MapReduce: Programming model for processing large datasets.
- Hive: Data warehouse infrastructure for querying and managing large datasets.
- HBase: NoSQL database for real-time, scalable data storage.
- PowerBI: Business intelligence tool for data visualization and reporting.
- Setup Cloudera: Install and configure Cloudera on your system.
- Clone the Repository: Clone this repository to your local machine using
git clone https://github.com/chaimaebouyarmane/Big_Data.git
. - Data Extraction: Run Python scripts to extract parliamentary data.
- Big Data Processing: Utilize Hadoop, MapReduce, Hive, and HBase for data processing.
- Visualization: Use PowerBI to visualize the calculated KPIs.
Feel free to reach out to us if you have any questions or suggestions:
Chaimae BOUYARMANE