Skip to content

chaitanya-basava/HSU_TransEmb

Repository files navigation

HSU_TransEmb@Dravidian-CodeMix-HASOC2021

Abstract

Hate speech is a form of oral, written or physical activity that criticizes or uses derogatory language in correspondence to a person or a community discriminating their identity factors. Hate speech or the use of offensive language can endanger democratic principles and societal stability. The growing usage of social media is also increasing the number of people being affected by hate speech. Online hate speech moderation has been significantly increasing, especially through social media platforms like Facebook, Twitter, YouTube, and Instagram. It is high time that we take appropriate actions to curb the intensifying online hate speech by supporting the detection of hate speech or offensive language texts in social media. The work presented to Hate Speech and Offensive Content Identification in Dravidian-CodeMix (HASOC) 2021, a joint assignment under Forum for Information Retrieval Evaluation (FIRE) 2021, is described in this paper. In this paper, we, the team of HSU_TransEmb, proposed an ensemble system of Transformer models namely mBERT, DistilBERT and MuRIL to achieve the task of classifying social media code-mixed comments/posts in Dravidian Languages (Malayalam-English and Tamil-English) as offensive or not-offensive texts. The motivation behind this was to use the power of transformers in combination with ensembling to enhance the prediction quality. For task 2 in the competition, the proposed ensemble method received 3rd and 6th positions in Malayalam and Tamil languages respectively.

Paper link

Install Requirements

pip install -r requirements.txt

Contribution Instructions

Add pre commit hook for auto-formatting before making any commit. Run the command to setup pre-commit hook for the repo.

pre-commit install

Team