Explainable Machine Learning in Linguistics and Applied NLP: Two Case Studies of Norwegian Dialectometry and Sexism Detection in French Tweets

Abstract

This thesis presents an exploration of explainable machine learning in the context of a traditional linguistic area (dialect classification) and an applied task (sexism detection).

In both tasks, the input features deemed especially relevant for the classification form meaningful groups that fit in with previous research on the topic, although not all such features are easy to understand or provide plausible explanations.

In the case of dialect classification, some important features show that the model also learned patterns that are not typically presented by dialectologists. For both case studies, I use LIME (Ribeiro et al., 2016) to rank features by their importance for the classification.

For the sexism detection task, I additionally examine attention weights, which produce feature rankings that are in many cases similar to the LIME results but that are over all worse at showcasing tokens that are especially characteristic of sexist tweets.

Code

To re-run the experiments I carried out in my thesis, run the following scripts:

## LIME for dialect classification
dialects_predict.sh
# When the previous script has finished running, run:
dialects_analyze.sh

## LIME for tweet classification
tweets_predict.sh
# When the previous script has finished running, run:
tweets_analyze.sh

## Attention weights for tweet classification
tweets_predict_attention.sh
# When the previous script has finished running, run:
tweets_analyze_attention.sh

The exact state of the codebase as used in my MA thesis is available via the ma-thesis release.

The tables with the results are in the models directory.

Thesis

The full description and analysis of this research can be found in my thesis.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
models		models
ngram_lime @ bc9da1d		ngram_lime @ bc9da1d
tex		tex
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
attention_entropy.py		attention_entropy.py
average_attention.py		average_attention.py
average_model_performance.py		average_model_performance.py
check_model_r2.py		check_model_r2.py
cluster_features.py		cluster_features.py
compare_averages.py		compare_averages.py
compare_lime_sample_sizes.py		compare_lime_sample_sizes.py
compare_setups.py		compare_setups.py
comprehensiveness_sufficiency.py		comprehensiveness_sufficiency.py
correlate_results.py		correlate_results.py
dialects_analyze.sh		dialects_analyze.sh
dialects_lime_sample_size.sh		dialects_lime_sample_size.sh
dialects_predict.sh		dialects_predict.sh
extract_features.py		extract_features.py
feature_context.py		feature_context.py
feature_correlation.py		feature_correlation.py
find_dialect_features.py		find_dialect_features.py
get_tweets.py		get_tweets.py
keys--TEMPLATE.py		keys--TEMPLATE.py
parse_dialects.py		parse_dialects.py
parse_frequency_list.py		parse_frequency_list.py
parse_results.py		parse_results.py
plot_attention.py		plot_attention.py
plot_importance.py		plot_importance.py
predict.py		predict.py
predict_fold.py		predict_fold.py
prepare_folds.py		prepare_folds.py
representativeness_distinctiveness.py		representativeness_distinctiveness.py
requirements.txt		requirements.txt
run.bat		run.bat
thesis.pdf		thesis.pdf
tweet_cleanup.py		tweet_cleanup.py
tweet_stats.py		tweet_stats.py
tweets_analyze.sh		tweets_analyze.sh
tweets_analyze_attention.sh		tweets_analyze_attention.sh
tweets_grid_search.sh		tweets_grid_search.sh
tweets_predict.sh		tweets_predict.sh
tweets_predict_attention.sh		tweets_predict_attention.sh
tweets_try_lime_sample_sizes.sh		tweets_try_lime_sample_sizes.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explainable Machine Learning in Linguistics and Applied NLP: Two Case Studies of Norwegian Dialectometry and Sexism Detection in French Tweets

Abstract

Code

Thesis

About

Releases 1

Languages

verenablaschke/ma-thesis

Folders and files

Latest commit

History

Repository files navigation

Explainable Machine Learning in Linguistics and Applied NLP: Two Case Studies of Norwegian Dialectometry and Sexism Detection in French Tweets

Abstract

Code

Thesis

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages