This is an example showing how we can validate the hypothesis that a distribution follows the power-law. For more details from the theoretical aspects, please refer to the paper "Power-law distributions in empirical data." by Clauset et al. or our slides for a quick glance.
- The step-by-step python code is in
powerlaw.ipynb
. - The MLE estimator and goodness of fit are explained in the slides
Plotting_Power_laws_and_the_Degree_Exponent.pdf
. - The output figures are in
powerlaw_report.pdf
. We also consider fitting the data by the log-logistic distribution. - The parallelized synthetic data generation code can be found at
powerlaw_synthetic_parallel.py
. - The dataset for this example is located at
news_events_powerlaw.csv
. It is the number of news event reported by the online news site in the last years. Data source: GDelt.
Distribution of the real data in news_events_powerlaw.csv
(as an example):
The slope in double logarithm scale is the exponent of the power-law. KS distance measure the "distance" of the real data and obtained model. The minimum degree where the power-law starts is 2 here.
Using the obtained model, we generate synthetic sequences, which are used to evaluate the goodness of fit.
The p-value is exactly the portion of synthetic sequences whose KS distance is larger than the real data's. When p-value is large enough, >10% in most cases, we can say the Power-law is a plausible fit to the real data.