If you use this code and/or datasets (upcoming) please refer to the following works or email the corresponding author (Alvaro) to get access to the Tweet database in a per request basis:
Dataset 3: annotations from both.
- Alvaro Garcia-Recuero, et al., Trollslayer: Crowdsourcing and Characterization of Abusive Birds in Twitter, The Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS 2018), Valencia, Spain pdf
Dataset 2: contains annotations from a large number of crowdworkers in the Crowdflower platform.
- Alvaro Garcia-Recuero: Efficient Privacy-preserving Adversarial Learning in Decentralized Online Social Networks. ASONAM '17 International Symposium on Foundations of Open Source Intelligence and Security Informatics FOSINT-SI 2017, Sydney, Australia, pdf.
Dataset 1: contains annotations from crowdworkers in the Trollslayer platform.
- Alvaro Garcia-Recuero, Jeff Burdges, Christian Grothoff: Privacy-Preserving Abuse Detection in Future Decentralized Online Social Networks. In the 11th ESORICS International Workshop on Data Privacy Management, 2016, Crete, Greece, pdf.
Copyright (C) 2015-present - Álvaro García Recuero
This file is part of the Trollslayer framework
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, see https://www.gnu.org/licenses.
- python-2.7.9
- sqlalchemy
- twython
- pymysql
- requests
- py-getch
- pyfiglet
- termcolor
- colorama
- wget https://pypi.python.org/packages/source/g/getch/getch-1.0-python2.tar.gz#md5=586ea0f1f16aa094ff6a30736ba03c50
- tar xvf getch-1.0-python2.tar.gz
- cd getch-1.0
- python setup.py install
- pip install colorama
- pip install termcolor
- pip install git+https://github.com/pwaller/pyfiglet
- $ python groundtruth_reader.py (It will display a message: "Loading tweets from db... please wait")
- Next, you will prompted to enter your reviewer id. If you have one, great; else create it.
- Once a tweet is loaded, read the guidelines below before giving ans answer on whether you consider it abusive or not.
- There are four options to mark a tweet, right(acceptable), left(abusive), up(undo), down(skip).
- Skipping the tweet will flag it as 'unknown', which can be considered as neutral (neither positive nor negative, a blank vote).
- Deny: encouraging self-harm to others users, promoting violence (direct or indirect), terrorism or similar activities.
- Disrupt: distracting provocations, denial-of-service, flooding with messages, promote abuse.
- Degrade: disclosing personal and private data of others without their approval as to harm their public image/reputation.
- Deceive: supplanting a known user identity (impersonation) for influencing other users behavior and activities, including assuming false identities (but not pseudonyms).
As you can see, it is easy to map the above set of guidelines to Twitter, TrollDoor, etc. While we do not believe TrollDoor is a very good example of fighting online abuse (direct crowdsourcing to users), their guidelines seem to resemble those of Twitter.
- Violent threats (direct or indirect): promote violence, terrorism or similar, also to minorities or disable people, etc.
- Abuse and harassment: sending abuse, threads, harrasing message to other user/s.
- Self-harm: encouraging other users to commit self-harming acts is considered abuse as well.
- Private information disclosure: to publish personal data about other users without their consent.
- Impersonation: pretending to be someone else by registering fake accounts that expose information or similar meta-data from those which are a real.
Trolldor (which is not available anymore, presumably shutdown by Twitter after acquisition but we had access at the time of writing this. Unlike Trolldor, in Trollslayer we do not annotate users but tweets in order to prevent abuse from the actual reporters themselves as in Name Squatting or other related practices)
- Provocation: constructive debate holds no interest for trolls; their goal is to get attention by way of provocation.
- Creep: users who fill other users timeline on daily basis with messages worshiping their idols, friends, relatives and colleagues.
- Normally, they use “black humour” and jokes.
- They justify abusive comments with the excuse that it is clever humour and simply misunderstood by many people.
- It is claimed that Sly Trolls are more skilled at rhetoric.
- They boast of being intellectually superior; although usually mistakenly. But they achieve their objective: to scare users less capable at answering back.
- In the commercial world, they usually criticise a specific company or product, disguising themselves as dissatisfied clients or sending questions that can put whoever has to answer in a tight spot.