Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
oliviaguest authored Jul 30, 2016
1 parent bcf53ba commit 5af26bb
Showing 1 changed file with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# gini
Calculate the Gini coefficient of a numpy array.
##Overview
Calculate the Gini coefficient of a numpy array. Gini coefficients are often used to quantify income inequality, read more [here](https://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm).

The function in ```gini.py``` is based on the third equation from [here](https://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm), which defines the Gini coefficient as:

Expand All @@ -8,28 +9,42 @@ The function in ```gini.py``` is based on the third equation from [here](https://

##Examples
For a very unequal sample, 999 zeros and a single one:

```
>>> from gini import *
>>> a = np.zeros((1000))
>>> a[0] = 1.0
```

The Gini coefficient is very close to 1.0:

```
>>> gini(a)
0.99890010998900103
```

For uniformly distributed random numbers, it will be low, around 0.33:

```
>>> s = np.random.uniform(-1,0,1000)
>>> gini(s)
0.3295183767105907
```

##Code
The code itself is very self-explanatory with respect to what it allows. The Gini calculation itself, requires non-zero positive sorted values within a 1d vector. This is dealt with within ```gini()```. So these four assumptions can be violated, as they are corrected for within the function:
```python
def gini(array):
"""Calculate the Gini coefficient of a numpy array."""
# based on bottom eq: https://www.statsdirect.com/help/content/image/stat0206_wmf.gif
# from: https://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm
array = array.flatten() #all values are treated equally, arrays must be 1d
if np.amin(array) < 0:
array -= np.amin(array) #values cannot be negative
array += 0.0000001 #values cannot be 0
array = np.sort(array) #values must be sorted
index = np.arange(1,array.shape[0]+1) #index per array element
n = array.shape[0]#number of array elements
return ((np.sum((2 * index - n - 1) * array)) / (n * np.sum(array))) #Gini coefficient
```

##Notes
It is faster than [pysal.inequality.gini](https://pysal.readthedocs.io/en/latest/_modules/pysal/inequality/gini.html) and answers are indistinguishable before approximately 6 decimal places (i.e., they are the same arithmetically for all intents and purposes).
Other Gini coefficient functions found online do not produce equivalent results, hence why I wrote this.

0 comments on commit 5af26bb

Please sign in to comment.