UNITED NATIONS SPEECHES 1970-2015 ANALYSIS

GENERAL STRUCTURE

Data introduction & overview in notebook (1)
Data cleaning & tokenisation
TF-IDF weighting of document-term matrix
Initial exploratory analysis & visualisations (wordclouds etc.)
Topic modeling with SKLEARN
- includes: Nonnegative Matrix Factorisation (NMF)
- Singular Value Decomposition (SVD)
- SpaCy-processed lemma extracted NMF model
Topic modeling with GENSIM
- NMF unigrams and NMF bigrams
- Latent Dirichlet Allocation (LDA)
- Hierarchical Dirichlet Process (HDP) (experimental implementation)
- Optimal topic counts with Coherence optimisation
Automated text summarisations of speeches
- TF-IDF
- LSA Algo
- TextRank Algo

Further refine models using different grammatical structures extracted with SpaCy in notebook (5)
- E.G. noun phrase, NER mentions etc.
Re-add tuned Latent Dirichlet Allocation & interactive visual for SKLEARN
Re-add the SVD model (required fixing before pushing changes - temporarily removed as of 31.03.22)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitattributes		.gitattributes
1-UN-speeches-initial-data-intro.ipynb		1-UN-speeches-initial-data-intro.ipynb
2-UN-speeches-data-cleaning-and-exploratory-visualisations.ipynb		2-UN-speeches-data-cleaning-and-exploratory-visualisations.ipynb
3-UN-speeches-topic-modeling-sklearn.ipynb		3-UN-speeches-topic-modeling-sklearn.ipynb
4-UN-speeches-topic-modeling-gensim.ipynb		4-UN-speeches-topic-modeling-gensim.ipynb
5-UN-speeches-spacy-processing-topic-modeling-v2.ipynb		5-UN-speeches-spacy-processing-topic-modeling-v2.ipynb
6-UN-speeches-text-summarisation.ipynb		6-UN-speeches-text-summarisation.ipynb
README.md		README.md
display_topics_func.py		display_topics_func.py
sample_aus_2015_vis.html		sample_aus_2015_vis.html