Skip to content

Big Data (A.Y. 2021/2022) project for University of Verona

Notifications You must be signed in to change notification settings

Fabbro96/Big-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project of the course "Big Data" at the University of Verona: wikipedia inverted index

The project consists in the creation of an inverted index of the wikipedia dump. It was done in three steps:

  1. Parsing of the wikipedia dump (an XML file) using the etree library
  2. Manipulation of the csv files to create the inverted index and other files needed for the next step
  3. Creating the graphs

How to run:

The project is written in Python using Jupyter Notebook. In the first cell you'll find the libraries that are needed to run the project. You can install them using pip.

About

Big Data (A.Y. 2021/2022) project for University of Verona

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages