The project consists in the creation of an inverted index of the wikipedia dump. It was done in three steps:
- Parsing of the wikipedia dump (an XML file) using the etree library
- Manipulation of the csv files to create the inverted index and other files needed for the next step
- Creating the graphs