In this project we will conduct exploratory probability distribution fitting and a time-series analysis of twitter data to identify any trend or pattern over time.
Code repository by Calbert Graham, University of Cambridge, U.K.
This repository contains the following files:
- fitting-probability-distribution-to-data.ipynb: the main file for the probability distribution analysis
- time-series-analysis.ipynb: the main file for the time series analysis
- tweet-time-series.txt: the tweet dataset used for both analyses
- License
- README.md
- .gitignore
To use this code, please ensure you have an up-to-date installation of Python 3, preferably running in a virtual environment. The code was prepared in Google Colaboratory and exported to Github. There may be minor compatibility issues. Please submit a pull request if you find any.
For fitting probability distribution, you will need to install the following Python packages: numpy, SciPy, matplotlib, pandas, seaborn. You might do something like pip3 install numpy scipy matplotlib pandas seaborn
once you've upgraded pip.
For the time series analysis, you will need to install the following additonal package: statsmodels
If you run the code in Google Colab then packages will automatically be installed when the import command is run.
This experiment depends on data made available to the author by Dr Ling Wang from the School of Electronic Engineering and Computer Science, Queen Mry University. You can download the dataset and others from his website.
Here is a sample of the data included in the study:
Time (seconds) | Number of tweets |
---|---|
60000 | 33 |
60001 | 37 |
60002 | 27 |
60003 | 37 |
60004 | 38 |
... | ... |
69995 | 2 |
69996 | 2 |
69997 | 1 |
69998 | 1 |
69999 | 2 |
Experiment output files will save to your working directory (i.e. where you download the repository and/or run the experiment from) unless you update the code. One easy way to download the entire repository is to click the link to the main page and add '/zipball/master/' to the URL - works like a charm.
None required. Use as you see fit.
Calbert Graham, [email protected], July 2022