Update README.md

lkev · Jul 30, 2016 · 5af26bb · 5af26bb
1 parent bcf53ba
commit 5af26bb
Showing 1 changed file with 19 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,6 @@
 # gini
-Calculate the Gini coefficient of a numpy array.
+##Overview
+Calculate the Gini coefficient of a numpy array. Gini coefficients are often used to quantify income inequality, read more [here](https://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm).
 
 The function in ```gini.py``` is based on the third equation from [here](https://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm), which defines the Gini coefficient as:
 
@@ -8,28 +9,42 @@ The function in ```gini.py``` is based on the third equation from [here](https://
 
 ##Examples
 For a very unequal sample, 999 zeros and a single one:
-
 ```
 >>> from gini import *
 >>> a = np.zeros((1000))
 >>> a[0] = 1.0
 ```
 
 The Gini coefficient is very close to 1.0:
-
 ```
 >>> gini(a)
 0.99890010998900103
 ```
 
 For uniformly distributed random numbers, it will be low, around 0.33:
-
 ```
 >>> s = np.random.uniform(-1,0,1000)
 >>> gini(s)
 0.3295183767105907
 ```
 
+##Code
+The code itself is very self-explanatory with respect to what it allows. The Gini calculation itself, requires non-zero positive sorted values within a 1d vector. This is dealt with within ```gini()```. So these four assumptions can be violated, as they are corrected for within the function:
+```python
+def gini(array):
+    """Calculate the Gini coefficient of a numpy array."""
+    # based on bottom eq: https://www.statsdirect.com/help/content/image/stat0206_wmf.gif
+    # from: https://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm
+    array = array.flatten() #all values are treated equally, arrays must be 1d
+    if np.amin(array) < 0:
+        array -= np.amin(array) #values cannot be negative
+    array += 0.0000001 #values cannot be 0
+    array = np.sort(array) #values must be sorted
+    index = np.arange(1,array.shape[0]+1) #index per array element
+    n = array.shape[0]#number of array elements
+    return ((np.sum((2 * index - n  - 1) * array)) / (n * np.sum(array))) #Gini coefficient
+```
+
 ##Notes
 It is faster than [pysal.inequality.gini](https://pysal.readthedocs.io/en/latest/_modules/pysal/inequality/gini.html) and answers are indistinguishable before approximately 6 decimal places (i.e., they are the same arithmetically for all intents and purposes).
 Other Gini coefficient functions found online do not produce equivalent results, hence why I wrote this.