patternpythonMinor
Compute Gini Coefficient
Viewed 0 times
coefficientginicompute
Problem
Recently, I was given a math assignment to calculate Gini Indexes for a table of percent distributions of aggregate income.
The table takes the form of:
Because of the length of the actual table I wrote a short python script to calculate the Gini Indexes. However, I'm fairly new to Python so I'd like to see what needs improvement.
Method of Calculation:
Calculate the log of the percentages with the quintile as the base of the logarithm (
Questions:
The table takes the form of:
Year 1st Quintile 2nd Quintile 3rd Quintile 4th Quintile 5th Quintile
---- ------------ ------------ ------------ ------------ ------------
1929 0.03 12.47 13.8 19.3 54.4
1935 4.1 9.2 14.1 20.9 51.7
... ... ... ... ... ...
Because of the length of the actual table I wrote a short python script to calculate the Gini Indexes. However, I'm fairly new to Python so I'd like to see what needs improvement.
Method of Calculation:
Calculate the log of the percentages with the quintile as the base of the logarithm (
i.e. log(0.0003)/log(0.2), log(0.1247)/log(0.4) ...) and then average these values to find an approximate exponent for the Lorenz curve. Calculate the Gini Index by finding twice the area between y=x and the Lorenz curve from 0 to 1.import numpy as np
import scipy.integrate as integrate
import itertools as it
def log_slope(x, y):
return np.log(y)/np.log(x)
# read in data from file
years, data = read_values('Quintiles') # shape: [[0.03, 0.12,...], [], []..., []]
accumulated_vals = [list(it.accumulate(v)) for v in data]
# percentiles; remove 0.0 and 1.0 after calculating
percentiles = np.linspace(0.0, 1.0, np.shape(accumulated_vals)[1] + 1)[1:-1]
for j, vals in enumerate(accumulated_vals):
sum = 0
for i, val in enumerate(vals[:-1]): # exclude the last accumulated value, which should be 1.0
sum += log_slope(percentiles[i], val)
average = sum / (len(vals)-1)
gini = 2 * integrate.quad(lambda x: x - pow(x, average), 0.0, 1.0)[0]
print('{:d}: {} -> {:.5f}'.format(int(years[j]), [round(k, 4) for k in vals], gini))
Questions:
- The code produces correct values and is alr
Solution
Stick closely to the sources
It's helpful when coding math in cases like this to base your approach on established methods and language.
It might seem a bit extreme, but this can include:
-
Following a published method to the letter.
-
Linking to a readily-available description of it.
-
Naming your variables and laying out your code as closely as possible to that description.
eg.
An example of this is here, a collection of basic number theory and elliptic curve utilities that eventually became SageMath.
For example, were you actually to use this formula as a basis, you would name your counter
Trust me, if you think you'll revisit your code at a later date (when memory fails), or have it used or modified by someone else, this is a lifesaver.
(Edit: As you are integrating a polynomial, you can use the
It's helpful when coding math in cases like this to base your approach on established methods and language.
It might seem a bit extreme, but this can include:
-
Following a published method to the letter.
-
Linking to a readily-available description of it.
-
Naming your variables and laying out your code as closely as possible to that description.
eg.
def gini_index(*args*):
"""
Calculates the Gini Index G given data of the form:
*whatever form your data is*
Using summation as described in:
*reference* (can be textbook, arxiv etc)
Via the formula:
*include formulae if possible*
Essentially as described here:
https://en.wikipedia.org/wiki/Gini_coefficient#Alternate_expressions
Example input:
*your example input*
Example output:
*your example output*
"""
# your code proper starts hereAn example of this is here, a collection of basic number theory and elliptic curve utilities that eventually became SageMath.
For example, were you actually to use this formula as a basis, you would name your counter
i, your total number of values n, and use the summation functions provided in numpy. Trust me, if you think you'll revisit your code at a later date (when memory fails), or have it used or modified by someone else, this is a lifesaver.
(Edit: As you are integrating a polynomial, you can use the
integ method of poly1d included with numpy rather than importing scipy.)Code Snippets
def gini_index(*args*):
"""
Calculates the Gini Index G given data of the form:
*whatever form your data is*
Using summation as described in:
*reference* (can be textbook, arxiv etc)
Via the formula:
*include formulae if possible*
Essentially as described here:
https://en.wikipedia.org/wiki/Gini_coefficient#Alternate_expressions
Example input:
*your example input*
Example output:
*your example output*
"""
# your code proper starts hereContext
StackExchange Code Review Q#149828, answer score: 3
Revisions (0)
No revisions yet.