Challenge from November 20th - December 19th
Social Media has become a main driver for many forms of hate expressions, and gender violence isn’t an exception. Based on the 2020 first semester report of the UNDP, El Salvador registered an increase of 5.2% of violent actions against women only in 2019.
There isn’t much available data regardings the problem, and the organizations devoted to actual research don’t have enough resources to actually gather the information themselves. Having access to an updated dataset would leverage the capabilities of organizations to develop solutions that tackle the problem and could be a foundation to further research.
No hay suficiente data respecto del problema, y muchas organizaciones dedicadas a la investigación no cuentan con los recursos para recolectarla por ellas mismas.Tener acceso a un dataset actualizado puede mejorar las capacidades de las organizaciones para desarrollar soluciones al problema y puede ser un punto de partida para otras investigaciones.
- Identify the main gender violence expressions shared on Twitter in El Salvador during 2020
- Define the categories of violence to be used by the classification model
- Train a classification that labels violence expressions shared on Twitter based on the previously defined categories
- Prepare a dataset that contains the output of the model and the most relevant information regarding each of the posts used for inference
-Data cleaning and preprocessing
-Additional Twitter scraping
–Exploratory Data Analysis(EDA)
-Categories definition
-Feature Engineering
-Model selection and first experiments
-NLP Model Development
-Documenting the categories
-Prepare the dataset to deliver
-Write the final report
-Share the data with the public