NOTE: github's rendering in notebook doesn't always work, if the ipynb file doesn't load, you can see it in https://nbviewer.jupyter.org/github/YIZHE12/SurvivalAnalysis/blob/master/EDA_survival_analysis.ipynb
You belong to the people analytics team for a food conglomerate. Employee turnover has been rampant for your 10 subsidiaries. The CFO estimates that the cost of replacing an employee is often larger than 100K USD, taking into account the time spent to interview and find a replacement, placement fees, sign-on bonuses and the loss of productivity for several months.
Your team has been tasked with diagnosing why and when employees from your subsidiaries leave. You need a tangible data-driven recommendation for each of the ten Presidents of your subsidiaries. What are your recommendations and why?
This is a survival analysis tasks that I solved using Kaplan Meier plot and Cox Proportional-Hazards Model. There are some data cleaning to do as the datedata is several formats. There are also outliers in the data. For examples, two data points have seniority of 90 years which is not likely as we don't expect someone who have worked for 90 years. For more information, you can have a look at the pdf file in the repo.
The notebook here included all the analysis. The data is the txt file uploaded.
pandas, numpy, lifelines, matplotlib, seaborn
pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install lifelines
import os
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib
import numpy as np
matplotlib.rcParams.update({'font.size': 20})
data = pd.read_csv('employee_retention.txt', index_col = 'Unnamed: 0')
data.head(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | company_id | dept | seniority | salary | join_date | quit_date | |
---|---|---|---|---|---|---|---|
0 | 1001444.0 | 8 | temp_contractor | 0 | 5850.0 | 2008-01-26 | 2008-04-25 |
1 | 388804.0 | 8 | design | 21 | 191000.0 | 05.17.2011 | 2012-03-16 |
2 | 407990.0 | 3 | design | 9 | 90000.0 | 2012-03-26 | 2015-04-10 |
3 | 120657.0 | 2 | engineer | 20 | 298000.0 | 2013-04-08 | 2015-01-30 |
4 | 1006393.0 | 1 | temp_contractor | 0 | 8509.0 | 2008-07-20 | 2008-10-18 |
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 34702 entries, 0 to 34701
Data columns (total 7 columns):
employee_id 34702 non-null float64
company_id 34702 non-null int64
dept 34702 non-null object
seniority 34702 non-null int64
salary 34463 non-null float64
join_date 34702 non-null object
quit_date 23510 non-null object
dtypes: float64(2), int64(2), object(3)
memory usage: 2.1+ MB
There are less data in the quit_date, this is because these people are not quite yet
Some salary data is missing
But maybe for the first survivol analysis we can do without
Notice the date has different formats: 05.17.2011 2012-03-16
data.join_date = pd.to_datetime(data.join_date)
data.quit_date = pd.to_datetime(data.quit_date)
# data.quit_date = data.quit_date.fillna(value=datetime.date.today())
data.quit_date.max()
Timestamp('2015-12-09 00:00:00')
data.join_date.max()
Timestamp('2015-12-10 00:00:00')
data.employee_id = data.employee_id.astype('int32')
len(data.employee_id.unique()) # check if there is replicate
34702
len(data.company_id.unique())
12
data.dept.unique()
array(['temp_contractor', 'design', 'engineer', 'marketing',
'customer_service', 'data_science', 'sales'], dtype=object)
Does every company has the all these departments?
for i in range(len(data.company_id.unique())):
print(i+1, data[data.company_id == i+1].dept.unique(), len(data[data.company_id == i+1].dept.unique()))
1 ['temp_contractor' 'customer_service' 'engineer' 'sales' 'data_science'
'marketing' 'design'] 7
2 ['engineer' 'data_science' 'design' 'temp_contractor' 'sales'
'customer_service' 'marketing'] 7
3 ['design' 'customer_service' 'data_science' 'sales' 'temp_contractor'
'marketing' 'engineer'] 7
4 ['temp_contractor' 'data_science' 'marketing' 'customer_service'
'engineer' 'design' 'sales'] 7
5 ['marketing' 'sales' 'temp_contractor' 'customer_service' 'data_science'
'design' 'engineer'] 7
6 ['marketing' 'temp_contractor' 'engineer' 'design' 'customer_service'
'data_science' 'sales'] 7
7 ['data_science' 'design' 'temp_contractor' 'customer_service' 'engineer'
'marketing' 'sales'] 7
8 ['temp_contractor' 'design' 'customer_service' 'engineer' 'sales'
'marketing' 'data_science'] 7
9 ['temp_contractor' 'customer_service' 'engineer' 'sales' 'data_science'
'marketing' 'design'] 7
10 ['data_science' 'temp_contractor' 'marketing' 'customer_service'
'engineer' 'sales' 'design'] 7
11 ['engineer' 'customer_service' 'marketing' 'data_science'] 4
12 ['data_science' 'engineer' 'customer_service' 'marketing' 'sales' 'design'] 6
How many examples each companies have, and for which department?
count_company = data.groupby('company_id').count()
count_company
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | dept | seniority | salary | join_date | quit_date | |
---|---|---|---|---|---|---|
company_id | ||||||
1 | 9501 | 9501 | 9501 | 9423 | 9501 | 5636 |
2 | 5220 | 5220 | 5220 | 5178 | 5220 | 3204 |
3 | 3773 | 3773 | 3773 | 3748 | 3773 | 2555 |
4 | 3066 | 3066 | 3066 | 3046 | 3066 | 2157 |
5 | 2749 | 2749 | 2749 | 2734 | 2749 | 1977 |
6 | 2258 | 2258 | 2258 | 2243 | 2258 | 1679 |
7 | 2185 | 2185 | 2185 | 2170 | 2185 | 1653 |
8 | 2026 | 2026 | 2026 | 2011 | 2026 | 1558 |
9 | 2005 | 2005 | 2005 | 1998 | 2005 | 1573 |
10 | 1879 | 1879 | 1879 | 1873 | 1879 | 1494 |
11 | 16 | 16 | 16 | 16 | 16 | 12 |
12 | 24 | 24 | 24 | 23 | 24 | 12 |
company 11 and 12 have too few data points!
count_department = data.groupby(['company_id', 'dept']).count()
count_department
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | seniority | salary | join_date | quit_date | ||
---|---|---|---|---|---|---|
company_id | dept | |||||
1 | customer_service | 3157 | 3157 | 3129 | 3157 | 1803 |
data_science | 1079 | 1079 | 1070 | 1079 | 565 | |
design | 499 | 499 | 491 | 499 | 269 | |
engineer | 1568 | 1568 | 1552 | 1568 | 748 | |
marketing | 1085 | 1085 | 1075 | 1085 | 620 | |
sales | 1098 | 1098 | 1091 | 1098 | 616 | |
temp_contractor | 1015 | 1015 | 1015 | 1015 | 1015 | |
2 | customer_service | 1548 | 1548 | 1530 | 1548 | 840 |
data_science | 568 | 568 | 562 | 568 | 269 | |
design | 223 | 223 | 223 | 223 | 126 | |
engineer | 829 | 829 | 822 | 829 | 384 | |
marketing | 541 | 541 | 535 | 541 | 295 | |
sales | 513 | 513 | 508 | 513 | 292 | |
temp_contractor | 998 | 998 | 998 | 998 | 998 | |
3 | customer_service | 1010 | 1010 | 1000 | 1010 | 545 |
data_science | 347 | 347 | 345 | 347 | 194 | |
design | 141 | 141 | 141 | 141 | 81 | |
engineer | 516 | 516 | 512 | 516 | 292 | |
marketing | 372 | 372 | 367 | 372 | 214 | |
sales | 363 | 363 | 359 | 363 | 205 | |
temp_contractor | 1024 | 1024 | 1024 | 1024 | 1024 | |
4 | customer_service | 777 | 777 | 769 | 777 | 415 |
data_science | 279 | 279 | 277 | 279 | 161 | |
design | 107 | 107 | 106 | 107 | 61 | |
engineer | 376 | 376 | 375 | 376 | 208 | |
marketing | 269 | 269 | 263 | 269 | 157 | |
sales | 254 | 254 | 252 | 254 | 151 | |
temp_contractor | 1004 | 1004 | 1004 | 1004 | 1004 | |
5 | customer_service | 635 | 635 | 631 | 635 | 355 |
data_science | 216 | 216 | 213 | 216 | 114 | |
... | ... | ... | ... | ... | ... | ... |
8 | data_science | 146 | 146 | 143 | 146 | 80 |
design | 53 | 53 | 53 | 53 | 24 | |
engineer | 191 | 191 | 190 | 191 | 103 | |
marketing | 135 | 135 | 132 | 135 | 68 | |
sales | 137 | 137 | 136 | 137 | 85 | |
temp_contractor | 979 | 979 | 979 | 979 | 979 | |
9 | customer_service | 342 | 342 | 341 | 342 | 186 |
data_science | 134 | 134 | 133 | 134 | 71 | |
design | 60 | 60 | 58 | 60 | 41 | |
engineer | 188 | 188 | 185 | 188 | 106 | |
marketing | 124 | 124 | 124 | 124 | 62 | |
sales | 113 | 113 | 113 | 113 | 63 | |
temp_contractor | 1044 | 1044 | 1044 | 1044 | 1044 | |
10 | customer_service | 336 | 336 | 333 | 336 | 190 |
data_science | 109 | 109 | 108 | 109 | 52 | |
design | 41 | 41 | 41 | 41 | 23 | |
engineer | 172 | 172 | 171 | 172 | 94 | |
marketing | 96 | 96 | 96 | 96 | 56 | |
sales | 111 | 111 | 110 | 111 | 65 | |
temp_contractor | 1014 | 1014 | 1014 | 1014 | 1014 | |
11 | customer_service | 6 | 6 | 6 | 6 | 3 |
data_science | 2 | 2 | 2 | 2 | 2 | |
engineer | 6 | 6 | 6 | 6 | 5 | |
marketing | 2 | 2 | 2 | 2 | 2 | |
12 | customer_service | 12 | 12 | 11 | 12 | 7 |
data_science | 4 | 4 | 4 | 4 | 2 | |
design | 1 | 1 | 1 | 1 | 0 | |
engineer | 4 | 4 | 4 | 4 | 1 | |
marketing | 1 | 1 | 1 | 1 | 0 | |
sales | 2 | 2 | 2 | 2 | 2 |
80 rows Ă— 5 columns
fig, ax = plt.subplots(figsize=(20,10))
count_department['employee_id'].unstack().plot(ax=ax, kind = 'bar')
plt.savefig('stat.png')
n_company = len(data.company_id.unique())
n_dept= len(data.dept.unique())
companies = data.company_id.unique()
plt.figure(figsize = (40, 30))
for i, company in enumerate(companies):
plt.subplot(4,3,i+1)
sns.distplot(data[data.company_id == company].salary.dropna(), norm_hist=False, kde=False, bins=20, \
hist_kws={"alpha": 1}).set(xlabel='Salary', ylabel='Count');
plt.title('company '+str(i+1) + ' salary' )
# sns.distplot(data[data.company_id == company].salary.dropna(),norm_hist=False)
plt.savefig('salary.png')
plt.figure(figsize = (40, 30))
for i, company in enumerate(companies):
plt.subplot(4,3,i+1)
sns.distplot(data[data.company_id == company].seniority.dropna(), norm_hist=False, kde=False, bins=20, \
hist_kws={"alpha": 1}).set(xlabel='seniority', ylabel='Count');
plt.title('company '+str(i+1) + ' seniority' )
# sns.distplot(data[data.company_id == company].salary.dropna(),norm_hist=False)
plt.savefig('seniority.png')
Create another copy of data to do numerical EDA
data_num = data.copy()
data_num.head(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | company_id | dept | seniority | salary | join_date | quit_date | |
---|---|---|---|---|---|---|---|
0 | 1001444 | 8 | temp_contractor | 0 | 5850.0 | 2008-01-26 | 2008-04-25 |
1 | 388804 | 8 | design | 21 | 191000.0 | 2011-05-17 | 2012-03-16 |
2 | 407990 | 3 | design | 9 | 90000.0 | 2012-03-26 | 2015-04-10 |
3 | 120657 | 2 | engineer | 20 | 298000.0 | 2013-04-08 | 2015-01-30 |
4 | 1006393 | 1 | temp_contractor | 0 | 8509.0 | 2008-07-20 | 2008-10-18 |
# data_num.quit_date = data_num.quit_date.fillna(value=datetime.date.today())
data_num = data_num.dropna()
data_num.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 23379 entries, 0 to 34701
Data columns (total 7 columns):
employee_id 23379 non-null int32
company_id 23379 non-null int64
dept 23379 non-null object
seniority 23379 non-null int64
salary 23379 non-null float64
join_date 23379 non-null datetime64[ns]
quit_date 23379 non-null datetime64[ns]
dtypes: datetime64[ns](2), float64(1), int32(1), int64(2), object(1)
memory usage: 1.3+ MB
data_num['lasting_data'] = data_num.quit_date - data_num.join_date
data_num['lasting_data'] = data_num['lasting_data'].dt.days
data_num.head(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | company_id | dept | seniority | salary | join_date | quit_date | lasting_data | |
---|---|---|---|---|---|---|---|---|
0 | 1001444 | 8 | temp_contractor | 0 | 5850.0 | 2008-01-26 | 2008-04-25 | 90 |
1 | 388804 | 8 | design | 21 | 191000.0 | 2011-05-17 | 2012-03-16 | 304 |
2 | 407990 | 3 | design | 9 | 90000.0 | 2012-03-26 | 2015-04-10 | 1110 |
3 | 120657 | 2 | engineer | 20 | 298000.0 | 2013-04-08 | 2015-01-30 | 662 |
4 | 1006393 | 1 | temp_contractor | 0 | 8509.0 | 2008-07-20 | 2008-10-18 | 90 |
data_num.company_id = pd.Categorical(data_num.company_id)
data_num.company_id = data_num.company_id.astype('object')
data_num.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 23379 entries, 0 to 34701
Data columns (total 8 columns):
employee_id 23379 non-null int32
company_id 23379 non-null object
dept 23379 non-null object
seniority 23379 non-null int64
salary 23379 non-null float64
join_date 23379 non-null datetime64[ns]
quit_date 23379 non-null datetime64[ns]
lasting_data 23379 non-null int64
dtypes: datetime64[ns](2), float64(1), int32(1), int64(2), object(2)
memory usage: 1.5+ MB
g = sns.pairplot(data_num[["company_id", "seniority", "salary", "lasting_data"]], \
hue = "company_id")
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py:487: RuntimeWarning: invalid value encountered in true_divide
binned = fast_linbin(X, a, b, gridsize) / (delta * nobs)
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/statsmodels/nonparametric/kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
FAC1 = 2*(np.pi*bw/RANGE)**2
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/numpy/core/fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
g = sns.pairplot(data_num[["dept", "seniority", "salary", "lasting_data"]], \
hue = "dept")
data['event'] = pd.isnull(data.quit_date).astype('int8')
# if there is a quit day, it is not nan, flag 0 - quit
data.quit_date.max()
Timestamp('2015-12-09 00:00:00')
2015-12-10 is the maximum date in the file
data.quit_date = data.quit_date.fillna(value='2015-12-13')
data.quit_date = pd.to_datetime(data.quit_date)
data.head(50)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | company_id | dept | seniority | salary | join_date | quit_date | event | |
---|---|---|---|---|---|---|---|---|
0 | 1001444 | 8 | temp_contractor | 0 | 5850.0 | 2008-01-26 | 2008-04-25 | 0 |
1 | 388804 | 8 | design | 21 | 191000.0 | 2011-05-17 | 2012-03-16 | 0 |
2 | 407990 | 3 | design | 9 | 90000.0 | 2012-03-26 | 2015-04-10 | 0 |
3 | 120657 | 2 | engineer | 20 | 298000.0 | 2013-04-08 | 2015-01-30 | 0 |
4 | 1006393 | 1 | temp_contractor | 0 | 8509.0 | 2008-07-20 | 2008-10-18 | 0 |
5 | 287530 | 5 | marketing | 20 | 180000.0 | 2014-06-30 | 2015-12-13 | 1 |
6 | 561043 | 3 | customer_service | 18 | 119000.0 | 2012-07-02 | 2014-03-28 | 0 |
7 | 702479 | 7 | data_science | 7 | 140000.0 | 2011-12-27 | 2013-08-30 | 0 |
8 | 545690 | 10 | data_science | 16 | 238000.0 | 2013-12-23 | 2015-12-13 | 1 |
9 | 622587 | 5 | sales | 28 | 166000.0 | 2015-07-01 | 2015-12-13 | 1 |
10 | 430126 | 2 | data_science | 3 | 77000.0 | 2015-08-03 | 2015-12-13 | 1 |
11 | 838072 | 3 | data_science | 13 | 162000.0 | 2011-10-03 | 2012-08-10 | 0 |
12 | 205557 | 8 | customer_service | 17 | 109000.0 | 2013-07-22 | 2014-07-18 | 0 |
13 | 554514 | 1 | customer_service | 4 | 33000.0 | 2013-04-15 | 2015-04-24 | 0 |
14 | 14751 | 7 | design | 18 | 162000.0 | 2012-04-30 | 2014-02-14 | 0 |
15 | 602443 | 3 | sales | 16 | 150000.0 | 2011-09-12 | 2013-07-19 | 0 |
16 | 488083 | 1 | engineer | 8 | NaN | 2011-06-13 | 2013-06-07 | 0 |
17 | 1007464 | 7 | temp_contractor | 0 | 7748.0 | 2009-11-14 | 2010-02-12 | 0 |
18 | 1002775 | 3 | temp_contractor | 0 | 7424.0 | 2008-01-14 | 2008-04-13 | 0 |
19 | 581423 | 6 | marketing | 1 | 35000.0 | 2012-01-09 | 2015-06-12 | 0 |
20 | 1000103 | 5 | temp_contractor | 0 | 9684.0 | 2008-05-18 | 2008-08-16 | 0 |
21 | 34604 | 2 | design | 29 | 224000.0 | 2015-09-08 | 2015-12-13 | 1 |
22 | 1008116 | 4 | temp_contractor | 0 | 9865.0 | 2010-10-03 | 2011-01-01 | 0 |
23 | 182278 | 1 | sales | 19 | 179000.0 | 2011-09-19 | 2012-11-02 | 0 |
24 | 1003092 | 2 | temp_contractor | 0 | 5459.0 | 2009-09-23 | 2009-12-22 | 0 |
25 | 296069 | 2 | engineer | 16 | 308000.0 | 2012-01-03 | 2015-12-13 | 1 |
26 | 1007778 | 7 | temp_contractor | 0 | 6749.0 | 2007-02-14 | 2007-05-15 | 0 |
27 | 612255 | 7 | customer_service | 6 | 66000.0 | 2015-03-23 | 2015-12-13 | 1 |
28 | 28269 | 2 | sales | 9 | 153000.0 | 2011-08-29 | 2012-08-03 | 0 |
29 | 904543 | 2 | data_science | 17 | 314000.0 | 2013-11-25 | 2015-12-13 | 1 |
30 | 289336 | 3 | design | 6 | 111000.0 | 2012-12-24 | 2015-06-26 | 0 |
31 | 591606 | 1 | customer_service | 22 | 123000.0 | 2015-08-24 | 2015-12-13 | 1 |
32 | 505031 | 8 | engineer | 15 | 229000.0 | 2013-09-09 | 2015-12-13 | 1 |
33 | 1006601 | 8 | temp_contractor | 0 | 9051.0 | 2008-03-11 | 2008-06-09 | 0 |
34 | 855236 | 2 | engineer | 19 | 309000.0 | 2012-01-17 | 2015-12-13 | 1 |
35 | 543068 | 5 | sales | 1 | 42000.0 | 2012-08-14 | 2013-08-02 | 0 |
36 | 282308 | 4 | data_science | 14 | 130000.0 | 2014-04-11 | 2015-12-13 | 1 |
37 | 1005290 | 2 | temp_contractor | 0 | 9723.0 | 2008-09-19 | 2008-12-18 | 0 |
38 | 643275 | 3 | customer_service | 3 | 21000.0 | 2011-06-13 | 2012-06-08 | 0 |
39 | 115980 | 3 | marketing | 6 | 100000.0 | 2012-04-09 | 2013-04-26 | 0 |
40 | 1006290 | 4 | temp_contractor | 0 | 9512.0 | 2008-10-13 | 2009-01-11 | 0 |
41 | 259298 | 1 | engineer | 9 | NaN | 2011-11-07 | 2015-10-16 | 0 |
42 | 1007928 | 10 | temp_contractor | 0 | 6538.0 | 2009-12-28 | 2010-03-28 | 0 |
43 | 13088 | 2 | customer_service | 4 | 34000.0 | 2015-09-21 | 2015-12-13 | 1 |
44 | 1004117 | 10 | temp_contractor | 0 | 8052.0 | 2007-05-17 | 2007-08-15 | 0 |
45 | 1002404 | 10 | temp_contractor | 0 | 7998.0 | 2009-09-13 | 2009-12-12 | 0 |
46 | 975096 | 1 | customer_service | 16 | 125000.0 | 2015-04-27 | 2015-12-13 | 1 |
47 | 432323 | 7 | engineer | 20 | 236000.0 | 2013-12-02 | 2015-12-13 | 1 |
48 | 921758 | 2 | engineer | 9 | 216000.0 | 2014-04-08 | 2015-01-02 | 0 |
49 | 301501 | 7 | customer_service | 29 | 93000.0 | 2014-05-27 | 2015-04-10 | 0 |
data['lasting_days'] = data.quit_date - data.join_date
data['lasting_days'] = data['lasting_days'].dt.days
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
matplotlib.rcParams.update({'font.size': 10})
fig, axes = plt.subplots(1, 1, figsize=(9, 5))
## Fit the data into the model
kmf.fit(data['lasting_days'] , data['event'], label='Kaplan Meier Estimate')
## Create an estimate
kmf.plot(ci_show=False, ax=axes)
plt.legend(loc='lower left')
kmf.median_
## ci_show is meant for Confidence interval, since our data set is too tiny, thus i am not showing it.
909.0
department_type = data['dept'].unique()
matplotlib.rcParams.update({'font.size': 25})
fig, axes = plt.subplots(1, 1, figsize=(20, 20))
for i, de in enumerate(department_type):
i1 = (data.dept == de) ## group i1 , having the pandas series for the 1st cohort
## fit the model for 1st cohort
kmf.fit(data['lasting_days'][i1] , data['event'][i1], label=de)
print(de, ':', kmf.median_)
kmf.plot(ci_show=False, ax=axes)
plt.yscale('log')
plt.savefig('dept_survival.png')
temp_contractor : inf
design : 909.0
engineer : 888.0
marketing : 909.0
customer_service : 888.0
data_science : 902.0
sales : 895.0
fig, ax = plt.subplots(1, 1, figsize=(20, 20))
kmf = KaplanMeierFitter()
for name, grouped_df in data.groupby(['company_id']):
kmf.fit(grouped_df["lasting_days"], grouped_df["event"], label=name)
print(name, ':', kmf.median_)
kmf.plot(ax=ax)
plt.savefig('company_survival.png')
1 : 895.0
2 : 902.0
3 : 916.0
4 : 902.0
5 : 923.0
6 : 923.0
7 : 923.0
8 : 929.0
9 : 916.0
10 : 937.0
11 : 1217.0
12 : 726.0
fig, ax = plt.subplots(1, 1, figsize=(20, 20))
kmf = KaplanMeierFitter()
for name, grouped_df in data.groupby(['company_id', 'dept']):
kmf.fit(grouped_df["lasting_days"], grouped_df["event"], label=name)
print(name, ':', kmf.median_)
kmf.plot(ci_show=False, ax=ax)
(1, 'customer_service') : 916.0
(1, 'data_science') : 902.0
(1, 'design') : 881.0
(1, 'engineer') : 846.0
(1, 'marketing') : 929.0
(1, 'sales') : 853.0
(1, 'temp_contractor') : inf
(2, 'customer_service') : 891.0
(2, 'data_science') : 895.0
(2, 'design') : 923.0
(2, 'engineer') : 864.0
(2, 'marketing') : 937.0
(2, 'sales') : 888.0
(2, 'temp_contractor') : inf
(3, 'customer_service') : 888.0
(3, 'data_science') : 825.0
(3, 'design') : 1000.0
(3, 'engineer') : 937.0
(3, 'marketing') : 951.0
(3, 'sales') : 865.0
(3, 'temp_contractor') : inf
(4, 'customer_service') : 825.0
(4, 'data_science') : 1007.0
(4, 'design') : 811.0
(4, 'engineer') : 937.0
(4, 'marketing') : 1014.0
(4, 'sales') : 867.0
(4, 'temp_contractor') : inf
(5, 'customer_service') : 853.0
(5, 'data_science') : 923.0
(5, 'design') : 1027.0
(5, 'engineer') : 1021.0
(5, 'marketing') : 727.0
(5, 'sales') : 929.0
(5, 'temp_contractor') : inf
(6, 'customer_service') : 795.0
(6, 'data_science') : 990.0
(6, 'design') : 1055.0
(6, 'engineer') : 881.0
(6, 'marketing') : 1280.0
(6, 'sales') : 909.0
(6, 'temp_contractor') : inf
(7, 'customer_service') : 923.0
(7, 'data_science') : 888.0
(7, 'design') : 816.0
(7, 'engineer') : 909.0
(7, 'marketing') : 923.0
(7, 'sales') : 1021.0
(7, 'temp_contractor') : inf
(8, 'customer_service') : 902.0
(8, 'data_science') : 923.0
(8, 'design') : 691.0
(8, 'engineer') : 950.0
(8, 'marketing') : 846.0
(8, 'sales') : 965.0
(8, 'temp_contractor') : inf
(9, 'customer_service') : 923.0
(9, 'data_science') : 825.0
(9, 'design') : 909.0
(9, 'engineer') : 993.0
(9, 'marketing') : 846.0
(9, 'sales') : 759.0
(9, 'temp_contractor') : inf
(10, 'customer_service') : 902.0
(10, 'data_science') : 797.0
(10, 'design') : 956.0
(10, 'engineer') : 991.0
(10, 'marketing') : 1007.0
(10, 'sales') : 951.0
(10, 'temp_contractor') : inf
(11, 'customer_service') : 587.0
(11, 'data_science') : inf
(11, 'engineer') : 1217.0
(11, 'marketing') : inf
(12, 'customer_service') : 1014.0
(12, 'data_science') : 881.0
(12, 'design') : 699.0
(12, 'engineer') : 699.0
(12, 'marketing') : 726.0
(12, 'sales') : inf
from lifelines import CoxPHFitter
data.company_id = data.company_id.astype('category')
# data.seniority = data.seniority.astype('category')
index = np.where(data.seniority>50)
data = data.drop(data.index[index])
data.head(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
employee_id | company_id | dept | seniority | salary | join_date | quit_date | event | lasting_days | |
---|---|---|---|---|---|---|---|---|---|
0 | 1001444 | 8 | temp_contractor | 0 | 5850.0 | 2008-01-26 | 2008-04-25 | 0 | 90 |
1 | 388804 | 8 | design | 21 | 191000.0 | 2011-05-17 | 2012-03-16 | 0 | 304 |
2 | 407990 | 3 | design | 9 | 90000.0 | 2012-03-26 | 2015-04-10 | 0 | 1110 |
3 | 120657 | 2 | engineer | 20 | 298000.0 | 2013-04-08 | 2015-01-30 | 0 | 662 |
4 | 1006393 | 1 | temp_contractor | 0 | 8509.0 | 2008-07-20 | 2008-10-18 | 0 | 90 |
from sklearn.preprocessing import MinMaxScaler
## Create dummy variables
# df_dummy = pd.get_dummies(data[['event','lasting_days','seniority','salary','company_id']], drop_first=True)
df_dummy = data[['event','lasting_days','seniority','salary','company_id','dept']]
df_dummy = df_dummy.dropna()
max_s = np.max(df_dummy.salary.values)
min_s = np.min(df_dummy.salary.values)
df_dummy.salary = (df_dummy.salary.values - min_s)/(max_s-min_s)
max_se = np.max(df_dummy.seniority.values)
min_se = np.min(df_dummy.seniority.values)
df_dummy.seniority = (df_dummy.seniority.values - min_se)/(max_se-min_se)
df_dummy.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
event | lasting_days | seniority | salary | company_id | dept | |
---|---|---|---|---|---|---|
0 | 0 | 90 | 0.000000 | 0.002109 | 8 | temp_contractor |
1 | 0 | 304 | 0.724138 | 0.461538 | 8 | design |
2 | 0 | 1110 | 0.310345 | 0.210918 | 3 | design |
3 | 0 | 662 | 0.689655 | 0.727047 | 2 | engineer |
4 | 0 | 90 | 0.000000 | 0.008707 | 1 | temp_contractor |
# Using Cox Proportional Hazards model
for i in range(12):
data_test = df_dummy[df_dummy.company_id == i+1]
data_test = data_test.drop(columns = ['company_id','dept'])
cph = CoxPHFitter() ## Instantiate the class to create a cph object
cph.fit(data_test, 'lasting_days', event_col='event') ## Fit the data to train the model
cph.print_summary() ## HAve a look at the significance of the features
<lifelines.CoxPHFitter: fitted with 9422 observations, 5596 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 9422
number of events = 3826
partial log-likelihood = -31068.96
time fit was run = 2019-07-19 12:58:06 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.02 0.98 0.07 -0.16 0.13 0.86 1.13
salary 0.12 1.13 0.09 -0.06 0.31 0.94 1.36
z p -log2(p)
seniority -0.21 0.83 0.27
salary 1.31 0.19 2.39
---
Concordance = 0.51
Log-likelihood ratio test = 2.27 on 2 df, -log2(p)=1.64
<lifelines.CoxPHFitter: fitted with 5178 observations, 3179 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 5178
number of events = 1999
partial log-likelihood = -14881.37
time fit was run = 2019-07-19 12:58:07 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.19 1.21 0.10 -0.01 0.39 0.99 1.47
salary 0.04 1.04 0.13 -0.21 0.29 0.81 1.34
z p -log2(p)
seniority 1.89 0.06 4.08
salary 0.32 0.75 0.42
---
Concordance = 0.53
Log-likelihood ratio test = 7.12 on 2 df, -log2(p)=5.13
<lifelines.CoxPHFitter: fitted with 3748 observations, 2543 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3748
number of events = 1205
partial log-likelihood = -8498.81
time fit was run = 2019-07-19 12:58:07 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.19 1.21 0.13 -0.07 0.45 0.93 1.57
salary -0.07 0.93 0.22 -0.49 0.36 0.61 1.43
z p -log2(p)
seniority 1.44 0.15 2.73
salary -0.32 0.75 0.41
---
Concordance = 0.54
Log-likelihood ratio test = 2.64 on 2 df, -log2(p)=1.91
<lifelines.CoxPHFitter: fitted with 3046 observations, 2146 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3046
number of events = 900
partial log-likelihood = -6067.60
time fit was run = 2019-07-19 12:58:07 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.51 1.67 0.15 0.22 0.81 1.25 2.25
salary -0.29 0.75 0.25 -0.77 0.19 0.46 1.21
z p -log2(p)
seniority 3.43 <0.005 10.67
salary -1.18 0.24 2.07
---
Concordance = 0.54
Log-likelihood ratio test = 12.81 on 2 df, -log2(p)=9.24
<lifelines.CoxPHFitter: fitted with 2734 observations, 1970 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 2734
number of events = 764
partial log-likelihood = -5033.96
time fit was run = 2019-07-19 12:58:07 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.14 1.15 0.17 -0.18 0.47 0.83 1.60
salary -0.26 0.77 0.28 -0.81 0.28 0.45 1.33
z p -log2(p)
seniority 0.86 0.39 1.35
salary -0.94 0.35 1.53
---
Concordance = 0.51
Log-likelihood ratio test = 0.99 on 2 df, -log2(p)=0.71
<lifelines.CoxPHFitter: fitted with 2243 observations, 1670 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 2243
number of events = 573
partial log-likelihood = -3607.36
time fit was run = 2019-07-19 12:58:08 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.26 1.30 0.18 -0.09 0.62 0.91 1.85
salary -0.04 0.96 0.30 -0.63 0.55 0.53 1.74
z p -log2(p)
seniority 1.46 0.14 2.80
salary -0.12 0.90 0.15
---
Concordance = 0.58
Log-likelihood ratio test = 3.08 on 2 df, -log2(p)=2.22
<lifelines.CoxPHFitter: fitted with 2170 observations, 1644 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 2170
number of events = 526
partial log-likelihood = -3264.59
time fit was run = 2019-07-19 12:58:08 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.32 1.37 0.20 -0.07 0.70 0.93 2.02
salary 0.18 1.19 0.32 -0.44 0.80 0.64 2.22
z p -log2(p)
seniority 1.60 0.11 3.20
salary 0.56 0.58 0.79
---
Concordance = 0.58
Log-likelihood ratio test = 6.29 on 2 df, -log2(p)=4.54
<lifelines.CoxPHFitter: fitted with 2011 observations, 1547 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 2011
number of events = 464
partial log-likelihood = -2813.20
time fit was run = 2019-07-19 12:58:08 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.32 1.38 0.21 -0.09 0.73 0.92 2.07
salary 0.33 1.39 0.34 -0.34 1.00 0.71 2.71
z p -log2(p)
seniority 1.54 0.12 3.03
salary 0.96 0.34 1.57
---
Concordance = 0.58
Log-likelihood ratio test = 8.36 on 2 df, -log2(p)=6.03
<lifelines.CoxPHFitter: fitted with 1998 observations, 1568 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1998
number of events = 430
partial log-likelihood = -2617.25
time fit was run = 2019-07-19 12:58:08 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.14 1.15 0.22 -0.29 0.57 0.75 1.77
salary 0.47 1.61 0.35 -0.22 1.17 0.80 3.21
z p -log2(p)
seniority 0.63 0.53 0.92
salary 1.34 0.18 2.48
---
Concordance = 0.60
Log-likelihood ratio test = 5.85 on 2 df, -log2(p)=4.22
<lifelines.CoxPHFitter: fitted with 1872 observations, 1490 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1872
number of events = 382
partial log-likelihood = -2269.34
time fit was run = 2019-07-19 12:58:08 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.31 1.36 0.24 -0.17 0.78 0.85 2.18
salary -0.01 0.99 0.39 -0.77 0.75 0.46 2.12
z p -log2(p)
seniority 1.28 0.20 2.31
salary -0.02 0.98 0.02
---
Concordance = 0.59
Log-likelihood ratio test = 2.74 on 2 df, -log2(p)=1.97
<lifelines.CoxPHFitter: fitted with 16 observations, 12 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 16
number of events = 4
partial log-likelihood = -0.00
time fit was run = 2019-07-19 12:58:09 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 86.63 4.20e+37 659.33 -1205.64 1378.90 0.00 inf
salary -257.87 0.00 1823.05 -3830.98 3315.24 0.00 inf
z p -log2(p)
seniority 0.13 0.90 0.16
salary -0.14 0.89 0.17
---
Concordance = 1.00
Log-likelihood ratio test = 10.48 on 2 df, -log2(p)=7.56
<lifelines.CoxPHFitter: fitted with 23 observations, 12 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 23
number of events = 11
partial log-likelihood = -20.07
time fit was run = 2019-07-19 12:58:09 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -2.03 0.13 3.29 -8.48 4.42 0.00 82.88
salary 3.75 42.71 6.65 -9.28 16.79 0.00 1.96e+07
z p -log2(p)
seniority -0.62 0.54 0.90
salary 0.56 0.57 0.80
---
Concordance = 0.55
Log-likelihood ratio test = 0.39 on 2 df, -log2(p)=0.28
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/lifelines/fitters/coxph_fitter.py:561: ConvergenceWarning: Newton-Rhapson failed to converge sufficiently in 50 steps.
warnings.warn("Newton-Rhapson failed to converge sufficiently in %d steps." % max_steps, ConvergenceWarning)
depts = data.dept.unique()
depts
array(['temp_contractor', 'design', 'engineer', 'marketing',
'customer_service', 'data_science', 'sales'], dtype=object)
data_test = df_dummy[df_dummy.dept == 'temp_contractor']
data_test = data_test.drop(columns = ['company_id','dept'])
data_test.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
event | lasting_days | seniority | salary | |
---|---|---|---|---|
0 | 0 | 90 | 0.0 | 0.002109 |
4 | 0 | 90 | 0.0 | 0.008707 |
17 | 0 | 90 | 0.0 | 0.006819 |
18 | 0 | 90 | 0.0 | 0.006015 |
20 | 0 | 90 | 0.0 | 0.011623 |
# Using Cox Proportional Hazards model
for i,de in enumerate(depts):
if i >1:
data_test = df_dummy[df_dummy.dept == de]
data_test = data_test.drop(columns = ['company_id','dept'])
cph = CoxPHFitter() ## Instantiate the class to create a cph object
cph.fit(data_test, 'lasting_days', event_col='event') ## Fit the data to train the model
print(de)
print('*'*20)
cph.print_summary() ## HAve a look at the significance of the features
engineer
********************
<lifelines.CoxPHFitter: fitted with 4568 observations, 2338 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 4568
number of events = 2230
partial log-likelihood = -16747.83
time fit was run = 2019-07-19 01:23:11 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.10 0.91 0.13 -0.35 0.15 0.71 1.16
salary 0.03 1.03 0.18 -0.32 0.38 0.72 1.46
z p -log2(p)
seniority -0.77 0.44 1.17
salary 0.16 0.87 0.19
---
Concordance = 0.51
Log-likelihood ratio test = 1.15 on 2 df, -log2(p)=0.83
marketing
********************
<lifelines.CoxPHFitter: fitted with 3132 observations, 1765 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3132
number of events = 1367
partial log-likelihood = -9762.54
time fit was run = 2019-07-19 01:23:11 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.03 1.03 0.17 -0.30 0.37 0.74 1.44
salary -0.17 0.84 0.36 -0.87 0.53 0.42 1.70
z p -log2(p)
seniority 0.18 0.85 0.23
salary -0.48 0.63 0.67
---
Concordance = 0.51
Log-likelihood ratio test = 0.38 on 2 df, -log2(p)=0.27
customer_service
********************
<lifelines.CoxPHFitter: fitted with 9089 observations, 5043 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 9089
number of events = 4046
partial log-likelihood = -33184.68
time fit was run = 2019-07-19 01:23:12 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.20 1.23 0.10 0.01 0.40 1.01 1.49
salary -0.77 0.46 0.35 -1.46 -0.09 0.23 0.92
z p -log2(p)
seniority 2.07 0.04 4.71
salary -2.21 0.03 5.22
---
Concordance = 0.51
Log-likelihood ratio test = 5.11 on 2 df, -log2(p)=3.68
data_science
********************
<lifelines.CoxPHFitter: fitted with 3157 observations, 1663 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3157
number of events = 1494
partial log-likelihood = -10679.77
time fit was run = 2019-07-19 01:23:12 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.03 1.03 0.16 -0.28 0.33 0.75 1.40
salary 0.06 1.06 0.22 -0.37 0.49 0.69 1.63
z p -log2(p)
seniority 0.17 0.87 0.21
salary 0.26 0.79 0.34
---
Concordance = 0.51
Log-likelihood ratio test = 0.47 on 2 df, -log2(p)=0.34
sales
********************
<lifelines.CoxPHFitter: fitted with 3148 observations, 1798 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3148
number of events = 1350
partial log-likelihood = -9595.81
time fit was run = 2019-07-19 01:23:13 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.16 0.85 0.17 -0.49 0.17 0.61 1.19
salary 0.54 1.71 0.35 -0.16 1.23 0.85 3.43
z p -log2(p)
seniority -0.95 0.34 1.54
salary 1.51 0.13 2.94
---
Concordance = 0.52
Log-likelihood ratio test = 2.55 on 2 df, -log2(p)=1.84
# Using Cox Proportional Hazards model
for i in range(10):
df_dummy2 = df_dummy[df_dummy.company_id == i+1]
df_dummy2 = df_dummy2.drop(columns = ['company_id'])
# Using Cox Proportional Hazards model
for j,de in enumerate(depts):
if j >1:
data_test = df_dummy2[df_dummy2.dept == de]
data_test = data_test.drop(columns = ['dept'])
cph = CoxPHFitter() ## Instantiate the class to create a cph object
cph.fit(data_test, 'lasting_days', event_col='event') ## Fit the data to train the model
print("company", i+1, de)
print('*'*20)
cph.print_summary() ## HAve a look at the significance of the features
company 1 engineer
********************
<lifelines.CoxPHFitter: fitted with 1552 observations, 738 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1552
number of events = 814
partial log-likelihood = -5205.78
time fit was run = 2019-07-19 01:24:34 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.29 1.34 0.24 -0.18 0.76 0.84 2.14
salary -0.53 0.59 0.32 -1.15 0.10 0.32 1.10
z p -log2(p)
seniority 1.22 0.22 2.18
salary -1.66 0.10 3.35
---
Concordance = 0.53
Log-likelihood ratio test = 2.89 on 2 df, -log2(p)=2.08
company 1 marketing
********************
<lifelines.CoxPHFitter: fitted with 1074 observations, 613 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1074
number of events = 461
partial log-likelihood = -2791.80
time fit was run = 2019-07-19 01:24:34 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.13 0.88 0.31 -0.74 0.47 0.48 1.61
salary 0.14 1.15 0.63 -1.10 1.38 0.33 3.97
z p -log2(p)
seniority -0.43 0.67 0.58
salary 0.22 0.83 0.27
---
Concordance = 0.51
Log-likelihood ratio test = 0.26 on 2 df, -log2(p)=0.19
company 1 customer_service
********************
<lifelines.CoxPHFitter: fitted with 3129 observations, 1791 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3129
number of events = 1338
partial log-likelihood = -9540.69
time fit was run = 2019-07-19 01:24:34 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.00 1.00 0.19 -0.38 0.38 0.69 1.46
salary -0.21 0.81 0.64 -1.46 1.05 0.23 2.86
z p -log2(p)
seniority -0.00 1.00 0.00
salary -0.32 0.75 0.42
---
Concordance = 0.51
Log-likelihood ratio test = 0.40 on 2 df, -log2(p)=0.29
company 1 data_science
********************
<lifelines.CoxPHFitter: fitted with 1070 observations, 562 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1070
number of events = 508
partial log-likelihood = -3089.21
time fit was run = 2019-07-19 01:24:34 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.36 1.44 0.29 -0.21 0.93 0.81 2.54
salary -0.01 0.99 0.41 -0.81 0.79 0.44 2.19
z p -log2(p)
seniority 1.25 0.21 2.25
salary -0.03 0.97 0.04
---
Concordance = 0.54
Log-likelihood ratio test = 5.10 on 2 df, -log2(p)=3.68
company 1 sales
********************
<lifelines.CoxPHFitter: fitted with 1091 observations, 612 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1091
number of events = 479
partial log-likelihood = -2914.55
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.69 0.50 0.32 -1.32 -0.05 0.27 0.95
salary 1.20 3.33 0.63 -0.04 2.45 0.96 11.53
z p -log2(p)
seniority -2.12 0.03 4.87
salary 1.90 0.06 4.12
---
Concordance = 0.53
Log-likelihood ratio test = 4.54 on 2 df, -log2(p)=3.28
company 2 engineer
********************
<lifelines.CoxPHFitter: fitted with 822 observations, 380 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 822
number of events = 442
partial log-likelihood = -2599.58
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.02 0.98 0.31 -0.62 0.58 0.54 1.79
salary 0.11 1.12 0.41 -0.70 0.92 0.50 2.52
z p -log2(p)
seniority -0.06 0.95 0.07
salary 0.27 0.79 0.34
---
Concordance = 0.51
Log-likelihood ratio test = 0.16 on 2 df, -log2(p)=0.12
company 2 marketing
********************
<lifelines.CoxPHFitter: fitted with 535 observations, 291 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 535
number of events = 244
partial log-likelihood = -1324.46
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.15 1.16 0.47 -0.77 1.06 0.46 2.90
salary -0.52 0.59 0.92 -2.32 1.28 0.10 3.60
z p -log2(p)
seniority 0.31 0.76 0.41
salary -0.57 0.57 0.81
---
Concordance = 0.51
Log-likelihood ratio test = 0.45 on 2 df, -log2(p)=0.32
company 2 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1530 observations, 828 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1530
number of events = 702
partial log-likelihood = -4467.66
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.23 1.26 0.25 -0.26 0.73 0.77 2.07
salary -0.91 0.40 0.85 -2.58 0.77 0.08 2.15
z p -log2(p)
seniority 0.92 0.36 1.48
salary -1.06 0.29 1.80
---
Concordance = 0.51
Log-likelihood ratio test = 1.13 on 2 df, -log2(p)=0.82
company 2 data_science
********************
<lifelines.CoxPHFitter: fitted with 562 observations, 265 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 562
number of events = 297
partial log-likelihood = -1617.21
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.40 1.49 0.41 -0.41 1.21 0.66 3.34
salary -0.55 0.58 0.57 -1.66 0.57 0.19 1.76
z p -log2(p)
seniority 0.97 0.33 1.58
salary -0.96 0.34 1.57
---
Concordance = 0.50
Log-likelihood ratio test = 1.00 on 2 df, -log2(p)=0.72
company 2 sales
********************
<lifelines.CoxPHFitter: fitted with 508 observations, 291 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 508
number of events = 217
partial log-likelihood = -1144.75
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.11 0.89 0.42 -0.94 0.72 0.39 2.05
salary 0.66 1.93 0.90 -1.10 2.41 0.33 11.16
z p -log2(p)
seniority -0.27 0.79 0.34
salary 0.73 0.46 1.11
---
Concordance = 0.52
Log-likelihood ratio test = 0.95 on 2 df, -log2(p)=0.69
company 3 engineer
********************
<lifelines.CoxPHFitter: fitted with 512 observations, 291 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 512
number of events = 221
partial log-likelihood = -1176.37
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.37 1.45 0.48 -0.57 1.31 0.57 3.72
salary -1.21 0.30 0.81 -2.79 0.37 0.06 1.45
z p -log2(p)
seniority 0.78 0.44 1.20
salary -1.50 0.13 2.90
---
Concordance = 0.53
Log-likelihood ratio test = 3.29 on 2 df, -log2(p)=2.37
company 3 marketing
********************
<lifelines.CoxPHFitter: fitted with 367 observations, 213 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 367
number of events = 154
partial log-likelihood = -771.02
time fit was run = 2019-07-19 01:24:35 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.36 0.70 0.59 -1.51 0.79 0.22 2.20
salary 0.26 1.29 1.42 -2.53 3.04 0.08 20.95
z p -log2(p)
seniority -0.62 0.54 0.89
salary 0.18 0.86 0.22
---
Concordance = 0.51
Log-likelihood ratio test = 0.95 on 2 df, -log2(p)=0.69
company 3 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1000 observations, 538 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1000
number of events = 462
partial log-likelihood = -2798.21
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.56 1.75 0.34 -0.11 1.23 0.90 3.41
salary -3.27 0.04 1.38 -5.98 -0.56 0.00 0.57
z p -log2(p)
seniority 1.65 0.10 3.33
salary -2.37 0.02 5.79
---
Concordance = 0.52
Log-likelihood ratio test = 6.34 on 2 df, -log2(p)=4.57
company 3 data_science
********************
<lifelines.CoxPHFitter: fitted with 345 observations, 193 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 345
number of events = 152
partial log-likelihood = -756.00
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.18 0.83 0.57 -1.31 0.94 0.27 2.56
salary 0.48 1.62 0.94 -1.36 2.33 0.26 10.27
z p -log2(p)
seniority -0.32 0.75 0.42
salary 0.51 0.61 0.72
---
Concordance = 0.48
Log-likelihood ratio test = 0.30 on 2 df, -log2(p)=0.22
company 3 sales
********************
<lifelines.CoxPHFitter: fitted with 359 observations, 203 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 359
number of events = 156
partial log-likelihood = -762.34
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.91 2.50 0.54 -0.15 1.98 0.86 7.25
salary -0.42 0.66 1.33 -3.03 2.19 0.05 8.92
z p -log2(p)
seniority 1.68 0.09 3.44
salary -0.32 0.75 0.41
---
Concordance = 0.56
Log-likelihood ratio test = 6.86 on 2 df, -log2(p)=4.95
company 4 engineer
********************
<lifelines.CoxPHFitter: fitted with 375 observations, 208 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 375
number of events = 167
partial log-likelihood = -845.44
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.29 1.34 0.50 -0.70 1.28 0.50 3.59
salary -0.27 0.76 0.90 -2.04 1.49 0.13 4.46
z p -log2(p)
seniority 0.58 0.56 0.82
salary -0.30 0.76 0.39
---
Concordance = 0.50
Log-likelihood ratio test = 0.42 on 2 df, -log2(p)=0.30
company 4 marketing
********************
<lifelines.CoxPHFitter: fitted with 263 observations, 153 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 263
number of events = 110
partial log-likelihood = -513.20
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.99 2.69 0.67 -0.31 2.29 0.73 9.92
salary -2.55 0.08 1.70 -5.89 0.79 0.00 2.21
z p -log2(p)
seniority 1.49 0.14 2.88
salary -1.50 0.13 2.89
---
Concordance = 0.54
Log-likelihood ratio test = 2.42 on 2 df, -log2(p)=1.75
company 4 customer_service
********************
<lifelines.CoxPHFitter: fitted with 769 observations, 412 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 769
number of events = 357
partial log-likelihood = -2049.08
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.70 2.02 0.35 0.01 1.40 1.01 4.04
salary -1.75 0.17 1.48 -4.65 1.15 0.01 3.15
z p -log2(p)
seniority 1.99 0.05 4.41
salary -1.18 0.24 2.08
---
Concordance = 0.51
Log-likelihood ratio test = 4.57 on 2 df, -log2(p)=3.30
company 4 data_science
********************
<lifelines.CoxPHFitter: fitted with 277 observations, 159 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 277
number of events = 118
partial log-likelihood = -546.75
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.24 1.27 0.67 -1.06 1.55 0.34 4.70
salary -0.08 0.93 1.12 -2.27 2.11 0.10 8.26
z p -log2(p)
seniority 0.36 0.72 0.48
salary -0.07 0.95 0.08
---
Concordance = 0.50
Log-likelihood ratio test = 0.31 on 2 df, -log2(p)=0.22
company 4 sales
********************
<lifelines.CoxPHFitter: fitted with 252 observations, 150 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 252
number of events = 102
partial log-likelihood = -467.89
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.87 0.42 0.69 -2.22 0.47 0.11 1.61
salary 2.31 10.04 1.76 -1.14 5.75 0.32 314.44
z p -log2(p)
seniority -1.27 0.20 2.29
salary 1.31 0.19 2.40
---
Concordance = 0.54
Log-likelihood ratio test = 1.80 on 2 df, -log2(p)=1.30
company 5 engineer
********************
<lifelines.CoxPHFitter: fitted with 311 observations, 178 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 311
number of events = 133
partial log-likelihood = -645.66
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 1.04 2.84 0.61 -0.15 2.23 0.86 9.34
salary -2.80 0.06 1.08 -4.92 -0.67 0.01 0.51
z p -log2(p)
seniority 1.72 0.09 3.55
salary -2.58 0.01 6.67
---
Concordance = 0.55
Log-likelihood ratio test = 7.65 on 2 df, -log2(p)=5.52
company 5 marketing
********************
<lifelines.CoxPHFitter: fitted with 224 observations, 114 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 224
number of events = 110
partial log-likelihood = -496.74
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.87 0.42 0.72 -2.27 0.53 0.10 1.71
salary 1.84 6.29 1.66 -1.42 5.10 0.24 163.90
z p -log2(p)
seniority -1.21 0.23 2.15
salary 1.11 0.27 1.89
---
Concordance = 0.54
Log-likelihood ratio test = 1.50 on 2 df, -log2(p)=1.08
company 5 customer_service
********************
<lifelines.CoxPHFitter: fitted with 631 observations, 353 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 631
number of events = 278
partial log-likelihood = -1545.96
time fit was run = 2019-07-19 01:24:36 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.07 1.07 0.42 -0.77 0.90 0.47 2.45
salary -1.14 0.32 1.76 -4.59 2.31 0.01 10.09
z p -log2(p)
seniority 0.15 0.88 0.19
salary -0.65 0.52 0.95
---
Concordance = 0.51
Log-likelihood ratio test = 1.03 on 2 df, -log2(p)=0.74
company 5 data_science
********************
<lifelines.CoxPHFitter: fitted with 213 observations, 113 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 213
number of events = 100
partial log-likelihood = -440.53
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.51 0.60 0.69 -1.85 0.84 0.16 2.31
salary 0.21 1.24 1.16 -2.06 2.48 0.13 11.92
z p -log2(p)
seniority -0.74 0.46 1.12
salary 0.18 0.86 0.23
---
Concordance = 0.50
Log-likelihood ratio test = 1.10 on 2 df, -log2(p)=0.79
company 5 sales
********************
<lifelines.CoxPHFitter: fitted with 254 observations, 148 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 254
number of events = 106
partial log-likelihood = -483.46
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.14 1.15 0.63 -1.09 1.38 0.33 3.96
salary -0.42 0.66 1.62 -3.59 2.75 0.03 15.60
z p -log2(p)
seniority 0.22 0.82 0.28
salary -0.26 0.79 0.33
---
Concordance = 0.51
Log-likelihood ratio test = 0.07 on 2 df, -log2(p)=0.05
company 6 engineer
********************
<lifelines.CoxPHFitter: fitted with 218 observations, 115 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 218
number of events = 103
partial log-likelihood = -453.19
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.04 1.04 0.67 -1.27 1.35 0.28 3.86
salary -0.00 1.00 1.10 -2.16 2.15 0.12 8.59
z p -log2(p)
seniority 0.06 0.95 0.07
salary -0.00 1.00 0.00
---
Concordance = 0.50
Log-likelihood ratio test = 0.01 on 2 df, -log2(p)=0.01
company 6 marketing
********************
<lifelines.CoxPHFitter: fitted with 174 observations, 117 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 174
number of events = 57
partial log-likelihood = -252.19
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.64 1.90 0.90 -1.12 2.41 0.33 11.09
salary -1.18 0.31 2.28 -5.65 3.28 0.00 26.66
z p -log2(p)
seniority 0.72 0.47 1.08
salary -0.52 0.60 0.73
---
Concordance = 0.57
Log-likelihood ratio test = 0.54 on 2 df, -log2(p)=0.39
company 6 customer_service
********************
<lifelines.CoxPHFitter: fitted with 495 observations, 257 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 495
number of events = 238
partial log-likelihood = -1260.17
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.13 1.14 0.44 -0.73 1.00 0.48 2.73
salary 0.22 1.24 1.90 -3.52 3.95 0.03 51.94
z p -log2(p)
seniority 0.30 0.76 0.39
salary 0.11 0.91 0.14
---
Concordance = 0.53
Log-likelihood ratio test = 0.63 on 2 df, -log2(p)=0.45
company 6 data_science
********************
<lifelines.CoxPHFitter: fitted with 151 observations, 84 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 151
number of events = 67
partial log-likelihood = -271.53
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -1.33 0.26 0.82 -2.94 0.28 0.05 1.32
salary 0.79 2.20 1.39 -1.94 3.51 0.14 33.46
z p -log2(p)
seniority -1.62 0.10 3.26
salary 0.57 0.57 0.81
---
Concordance = 0.57
Log-likelihood ratio test = 4.91 on 2 df, -log2(p)=3.54
company 6 sales
********************
<lifelines.CoxPHFitter: fitted with 161 observations, 85 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 161
number of events = 76
partial log-likelihood = -318.48
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -1.72 0.18 0.79 -3.26 -0.18 0.04 0.84
salary 3.83 46.18 1.93 0.05 7.61 1.05 2025.76
z p -log2(p)
seniority -2.18 0.03 5.11
salary 1.99 0.05 4.41
---
Concordance = 0.59
Log-likelihood ratio test = 4.99 on 2 df, -log2(p)=3.60
company 7 engineer
********************
<lifelines.CoxPHFitter: fitted with 223 observations, 123 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 223
number of events = 100
partial log-likelihood = -445.24
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.48 0.62 0.69 -1.83 0.88 0.16 2.40
salary 1.23 3.41 1.11 -0.95 3.40 0.39 30.01
z p -log2(p)
seniority -0.69 0.49 1.03
salary 1.10 0.27 1.89
---
Concordance = 0.54
Log-likelihood ratio test = 1.39 on 2 df, -log2(p)=1.00
company 7 marketing
********************
<lifelines.CoxPHFitter: fitted with 140 observations, 77 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 140
number of events = 63
partial log-likelihood = -243.90
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.54 1.72 0.91 -1.25 2.33 0.29 10.29
salary -0.92 0.40 2.05 -4.94 3.10 0.01 22.23
z p -log2(p)
seniority 0.59 0.55 0.86
salary -0.45 0.65 0.61
---
Concordance = 0.54
Log-likelihood ratio test = 0.35 on 2 df, -log2(p)=0.25
company 7 customer_service
********************
<lifelines.CoxPHFitter: fitted with 466 observations, 266 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 466
number of events = 200
partial log-likelihood = -1045.25
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.07 0.93 0.47 -0.99 0.84 0.37 2.32
salary 0.67 1.96 1.94 -3.12 4.47 0.04 87.26
z p -log2(p)
seniority -0.16 0.87 0.19
salary 0.35 0.73 0.46
---
Concordance = 0.52
Log-likelihood ratio test = 0.17 on 2 df, -log2(p)=0.13
company 7 data_science
********************
<lifelines.CoxPHFitter: fitted with 149 observations, 84 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 149
number of events = 65
partial log-likelihood = -270.08
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.57 1.77 0.91 -1.21 2.35 0.30 10.52
salary -1.51 0.22 1.47 -4.39 1.37 0.01 3.95
z p -log2(p)
seniority 0.63 0.53 0.92
salary -1.03 0.30 1.72
---
Concordance = 0.53
Log-likelihood ratio test = 1.34 on 2 df, -log2(p)=0.96
company 7 sales
********************
<lifelines.CoxPHFitter: fitted with 162 observations, 96 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 162
number of events = 66
partial log-likelihood = -269.08
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 1.29 3.64 0.85 -0.37 2.95 0.69 19.10
salary -3.24 0.04 2.09 -7.33 0.84 0.00 2.32
z p -log2(p)
seniority 1.53 0.13 2.98
salary -1.56 0.12 3.06
---
Concordance = 0.53
Log-likelihood ratio test = 2.57 on 2 df, -log2(p)=1.85
company 8 engineer
********************
<lifelines.CoxPHFitter: fitted with 190 observations, 102 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 190
number of events = 88
partial log-likelihood = -379.28
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.88 0.41 0.73 -2.32 0.55 0.10 1.73
salary 2.37 10.67 1.32 -0.23 4.96 0.80 143.00
z p -log2(p)
seniority -1.21 0.23 2.14
salary 1.79 0.07 3.76
---
Concordance = 0.55
Log-likelihood ratio test = 3.41 on 2 df, -log2(p)=2.46
company 8 marketing
********************
<lifelines.CoxPHFitter: fitted with 132 observations, 67 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 132
number of events = 65
partial log-likelihood = -249.50
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.28 1.33 0.96 -1.60 2.16 0.20 8.70
salary 1.67 5.32 2.22 -2.67 6.02 0.07 409.69
z p -log2(p)
seniority 0.30 0.77 0.38
salary 0.75 0.45 1.15
---
Concordance = 0.54
Log-likelihood ratio test = 3.83 on 2 df, -log2(p)=2.76
company 8 customer_service
********************
<lifelines.CoxPHFitter: fitted with 378 observations, 213 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 378
number of events = 165
partial log-likelihood = -833.58
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.07 1.07 0.51 -0.92 1.06 0.40 2.88
salary 0.42 1.52 2.26 -4.01 4.84 0.02 127.00
z p -log2(p)
seniority 0.13 0.89 0.16
salary 0.19 0.85 0.23
---
Concordance = 0.50
Log-likelihood ratio test = 0.34 on 2 df, -log2(p)=0.24
company 8 data_science
********************
<lifelines.CoxPHFitter: fitted with 143 observations, 78 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 143
number of events = 65
partial log-likelihood = -261.75
time fit was run = 2019-07-19 01:24:37 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.49 0.61 0.92 -2.30 1.31 0.10 3.69
salary 1.52 4.55 1.51 -1.44 4.47 0.24 87.63
z p -log2(p)
seniority -0.54 0.59 0.76
salary 1.00 0.32 1.66
---
Concordance = 0.52
Log-likelihood ratio test = 1.37 on 2 df, -log2(p)=0.99
company 8 sales
********************
<lifelines.CoxPHFitter: fitted with 136 observations, 84 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 136
number of events = 52
partial log-likelihood = -201.66
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.65 1.92 1.12 -1.55 2.85 0.21 17.37
salary -2.00 0.14 2.51 -6.92 2.92 0.00 18.53
z p -log2(p)
seniority 0.58 0.56 0.84
salary -0.80 0.43 1.23
---
Concordance = 0.50
Log-likelihood ratio test = 0.69 on 2 df, -log2(p)=0.50
company 9 engineer
********************
<lifelines.CoxPHFitter: fitted with 185 observations, 104 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 185
number of events = 81
partial log-likelihood = -355.88
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -1.25 0.29 0.77 -2.76 0.25 0.06 1.29
salary 0.92 2.52 1.27 -1.57 3.42 0.21 30.53
z p -log2(p)
seniority -1.63 0.10 3.28
salary 0.72 0.47 1.09
---
Concordance = 0.55
Log-likelihood ratio test = 4.53 on 2 df, -log2(p)=3.27
company 9 marketing
********************
<lifelines.CoxPHFitter: fitted with 124 observations, 62 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 124
number of events = 62
partial log-likelihood = -248.72
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.57 0.56 0.85 -2.24 1.10 0.11 3.01
salary 1.02 2.76 2.08 -3.06 5.09 0.05 162.65
z p -log2(p)
seniority -0.67 0.50 0.99
salary 0.49 0.63 0.68
---
Concordance = 0.56
Log-likelihood ratio test = 0.48 on 2 df, -log2(p)=0.34
company 9 customer_service
********************
<lifelines.CoxPHFitter: fitted with 341 observations, 186 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 341
number of events = 155
partial log-likelihood = -773.19
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.09 1.09 0.61 -1.11 1.29 0.33 3.63
salary 0.00 1.00 2.44 -4.78 4.78 0.01 119.59
z p -log2(p)
seniority 0.15 0.88 0.18
salary 0.00 1.00 0.00
---
Concordance = 0.50
Log-likelihood ratio test = 0.10 on 2 df, -log2(p)=0.07
company 9 data_science
********************
<lifelines.CoxPHFitter: fitted with 133 observations, 70 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 133
number of events = 63
partial log-likelihood = -253.93
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.07 0.93 0.84 -1.72 1.57 0.18 4.80
salary 0.89 2.43 1.27 -1.60 3.38 0.20 29.37
z p -log2(p)
seniority -0.09 0.93 0.11
salary 0.70 0.48 1.04
---
Concordance = 0.57
Log-likelihood ratio test = 1.22 on 2 df, -log2(p)=0.88
company 9 sales
********************
<lifelines.CoxPHFitter: fitted with 113 observations, 63 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 113
number of events = 50
partial log-likelihood = -192.57
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 1.18 3.26 1.00 -0.77 3.14 0.46 23.05
salary -2.20 0.11 2.66 -7.41 3.02 0.00 20.50
z p -log2(p)
seniority 1.18 0.24 2.08
salary -0.83 0.41 1.29
---
Concordance = 0.52
Log-likelihood ratio test = 1.60 on 2 df, -log2(p)=1.16
company 10 engineer
********************
<lifelines.CoxPHFitter: fitted with 170 observations, 93 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 170
number of events = 77
partial log-likelihood = -318.59
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.45 1.57 0.81 -1.14 2.03 0.32 7.64
salary -1.97 0.14 1.42 -4.76 0.82 0.01 2.28
z p -log2(p)
seniority 0.56 0.58 0.79
salary -1.38 0.17 2.58
---
Concordance = 0.54
Log-likelihood ratio test = 3.39 on 2 df, -log2(p)=2.45
company 10 marketing
********************
<lifelines.CoxPHFitter: fitted with 96 observations, 56 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 96
number of events = 40
partial log-likelihood = -151.41
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.25 1.29 1.00 -1.70 2.21 0.18 9.09
salary 0.50 1.65 2.53 -4.45 5.45 0.01 233.73
z p -log2(p)
seniority 0.26 0.80 0.33
salary 0.20 0.84 0.25
---
Concordance = 0.58
Log-likelihood ratio test = 0.53 on 2 df, -log2(p)=0.39
company 10 customer_service
********************
<lifelines.CoxPHFitter: fitted with 333 observations, 189 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 333
number of events = 144
partial log-likelihood = -711.79
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.97 2.65 0.60 -0.20 2.15 0.82 8.55
salary -2.35 0.10 2.46 -7.17 2.47 0.00 11.87
z p -log2(p)
seniority 1.63 0.10 3.27
salary -0.95 0.34 1.56
---
Concordance = 0.54
Log-likelihood ratio test = 3.15 on 2 df, -log2(p)=2.27
company 10 data_science
********************
<lifelines.CoxPHFitter: fitted with 108 observations, 51 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 108
number of events = 57
partial log-likelihood = -216.03
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -2.80 0.06 1.03 -4.81 -0.78 0.01 0.46
salary 3.97 53.04 1.59 0.85 7.10 2.33 1206.08
z p -log2(p)
seniority -2.72 0.01 7.27
salary 2.49 0.01 6.30
---
Concordance = 0.60
Log-likelihood ratio test = 7.73 on 2 df, -log2(p)=5.57
company 10 sales
********************
<lifelines.CoxPHFitter: fitted with 110 observations, 64 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 110
number of events = 46
partial log-likelihood = -171.25
time fit was run = 2019-07-19 01:24:38 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority 0.98 2.68 1.03 -1.04 3.01 0.35 20.20
salary -4.30 0.01 2.54 -9.29 0.68 0.00 1.98
z p -log2(p)
seniority 0.95 0.34 1.56
salary -1.69 0.09 3.46
---
Concordance = 0.63
Log-likelihood ratio test = 3.83 on 2 df, -log2(p)=2.77
# Using Cox Proportional Hazards model
for i in range(10):
df_dummy2 = df_dummy[df_dummy.company_id == i+1]
df_dummy2 = df_dummy2.drop(columns = ['company_id','seniority'])
# Using Cox Proportional Hazards model
for j,de in enumerate(depts):
if j >1:
data_test = df_dummy2[df_dummy2.dept == de]
data_test = data_test.drop(columns = ['dept'])
cph = CoxPHFitter() ## Instantiate the class to create a cph object
cph.fit(data_test, 'lasting_days', event_col='event') ## Fit the data to train the model
print("company", i+1, de)
print('*'*20)
cph.print_summary() ## HAve a look at the significance of the features
company 1 engineer
********************
<lifelines.CoxPHFitter: fitted with 1552 observations, 738 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1552
number of events = 814
partial log-likelihood = -5206.53
time fit was run = 2019-07-19 12:59:17 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.19 0.82 0.16 -0.52 0.13 0.60 1.14
z p -log2(p)
salary -1.19 0.24 2.08
---
Concordance = 0.52
Log-likelihood ratio test = 1.39 on 1 df, -log2(p)=2.07
company 1 marketing
********************
<lifelines.CoxPHFitter: fitted with 1074 observations, 613 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1074
number of events = 461
partial log-likelihood = -2791.89
time fit was run = 2019-07-19 12:59:17 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.09 0.91 0.33 -0.74 0.56 0.48 1.74
z p -log2(p)
salary -0.28 0.78 0.36
---
Concordance = 0.51
Log-likelihood ratio test = 0.08 on 1 df, -log2(p)=0.36
company 1 customer_service
********************
<lifelines.CoxPHFitter: fitted with 3129 observations, 1791 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 3129
number of events = 1338
partial log-likelihood = -9540.69
time fit was run = 2019-07-19 12:59:17 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.21 0.81 0.33 -0.84 0.43 0.43 1.54
z p -log2(p)
salary -0.63 0.53 0.92
---
Concordance = 0.51
Log-likelihood ratio test = 0.40 on 1 df, -log2(p)=0.92
company 1 data_science
********************
<lifelines.CoxPHFitter: fitted with 1070 observations, 562 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1070
number of events = 508
partial log-likelihood = -3089.99
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.41 1.51 0.22 -0.02 0.85 0.98 2.33
z p -log2(p)
salary 1.86 0.06 4.00
---
Concordance = 0.53
Log-likelihood ratio test = 3.54 on 1 df, -log2(p)=4.06
company 1 sales
********************
<lifelines.CoxPHFitter: fitted with 1091 observations, 612 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1091
number of events = 479
partial log-likelihood = -2916.82
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.04 1.04 0.33 -0.60 0.69 0.55 1.99
z p -log2(p)
salary 0.13 0.90 0.16
---
Concordance = 0.51
Log-likelihood ratio test = 0.02 on 1 df, -log2(p)=0.16
company 2 engineer
********************
<lifelines.CoxPHFitter: fitted with 822 observations, 380 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 822
number of events = 442
partial log-likelihood = -2599.59
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.09 1.10 0.23 -0.36 0.54 0.70 1.72
z p -log2(p)
salary 0.40 0.69 0.53
---
Concordance = 0.51
Log-likelihood ratio test = 0.16 on 1 df, -log2(p)=0.53
company 2 marketing
********************
<lifelines.CoxPHFitter: fitted with 535 observations, 291 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 535
number of events = 244
partial log-likelihood = -1324.51
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.27 0.76 0.46 -1.18 0.63 0.31 1.88
z p -log2(p)
salary -0.59 0.55 0.85
---
Concordance = 0.52
Log-likelihood ratio test = 0.35 on 1 df, -log2(p)=0.85
company 2 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1530 observations, 828 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1530
number of events = 702
partial log-likelihood = -4468.09
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.25 0.78 0.46 -1.15 0.65 0.32 1.92
z p -log2(p)
salary -0.54 0.59 0.76
---
Concordance = 0.51
Log-likelihood ratio test = 0.29 on 1 df, -log2(p)=0.76
company 2 data_science
********************
<lifelines.CoxPHFitter: fitted with 562 observations, 265 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 562
number of events = 297
partial log-likelihood = -1617.68
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.08 0.92 0.30 -0.66 0.50 0.52 1.65
z p -log2(p)
salary -0.27 0.79 0.34
---
Concordance = 0.50
Log-likelihood ratio test = 0.07 on 1 df, -log2(p)=0.34
company 2 sales
********************
<lifelines.CoxPHFitter: fitted with 508 observations, 291 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 508
number of events = 217
partial log-likelihood = -1144.78
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.46 1.58 0.49 -0.50 1.41 0.61 4.11
z p -log2(p)
salary 0.93 0.35 1.51
---
Concordance = 0.52
Log-likelihood ratio test = 0.88 on 1 df, -log2(p)=1.52
company 3 engineer
********************
<lifelines.CoxPHFitter: fitted with 512 observations, 291 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 512
number of events = 221
partial log-likelihood = -1176.67
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.67 0.51 0.40 -1.46 0.12 0.23 1.13
z p -log2(p)
salary -1.66 0.10 3.35
---
Concordance = 0.52
Log-likelihood ratio test = 2.68 on 1 df, -log2(p)=3.30
company 3 marketing
********************
<lifelines.CoxPHFitter: fitted with 367 observations, 213 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 367
number of events = 154
partial log-likelihood = -771.21
time fit was run = 2019-07-19 12:59:18 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.52 0.60 0.68 -1.85 0.82 0.16 2.26
z p -log2(p)
salary -0.76 0.45 1.16
---
Concordance = 0.51
Log-likelihood ratio test = 0.58 on 1 df, -log2(p)=1.16
company 3 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1000 observations, 538 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 1000
number of events = 462
partial log-likelihood = -2799.56
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.30 0.27 0.68 -2.64 0.03 0.07 1.03
z p -log2(p)
salary -1.92 0.06 4.18
---
Concordance = 0.51
Log-likelihood ratio test = 3.64 on 1 df, -log2(p)=4.15
company 3 data_science
********************
<lifelines.CoxPHFitter: fitted with 345 observations, 193 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 345
number of events = 152
partial log-likelihood = -756.05
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.23 1.26 0.52 -0.78 1.25 0.46 3.48
z p -log2(p)
salary 0.45 0.65 0.61
---
Concordance = 0.50
Log-likelihood ratio test = 0.20 on 1 df, -log2(p)=0.61
company 3 sales
********************
<lifelines.CoxPHFitter: fitted with 359 observations, 203 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 359
number of events = 156
partial log-likelihood = -763.74
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 1.44 4.21 0.72 0.03 2.85 1.03 17.23
z p -log2(p)
salary 2.00 0.05 4.45
---
Concordance = 0.56
Log-likelihood ratio test = 4.07 on 1 df, -log2(p)=4.52
company 4 engineer
********************
<lifelines.CoxPHFitter: fitted with 375 observations, 208 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 375
number of events = 167
partial log-likelihood = -845.60
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.15 1.16 0.51 -0.85 1.15 0.43 3.17
z p -log2(p)
salary 0.30 0.77 0.38
---
Concordance = 0.50
Log-likelihood ratio test = 0.09 on 1 df, -log2(p)=0.38
company 4 marketing
********************
<lifelines.CoxPHFitter: fitted with 263 observations, 153 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 263
number of events = 110
partial log-likelihood = -514.29
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.42 0.66 0.88 -2.15 1.31 0.12 3.69
z p -log2(p)
salary -0.48 0.63 0.66
---
Concordance = 0.52
Log-likelihood ratio test = 0.23 on 1 df, -log2(p)=0.66
company 4 customer_service
********************
<lifelines.CoxPHFitter: fitted with 769 observations, 412 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 769
number of events = 357
partial log-likelihood = -2051.02
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.67 1.95 0.81 -0.92 2.26 0.40 9.57
z p -log2(p)
salary 0.83 0.41 1.29
---
Concordance = 0.51
Log-likelihood ratio test = 0.69 on 1 df, -log2(p)=1.30
company 4 data_science
********************
<lifelines.CoxPHFitter: fitted with 277 observations, 159 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 277
number of events = 118
partial log-likelihood = -546.81
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.26 1.30 0.61 -0.94 1.46 0.39 4.31
z p -log2(p)
salary 0.42 0.67 0.57
---
Concordance = 0.52
Log-likelihood ratio test = 0.18 on 1 df, -log2(p)=0.58
company 4 sales
********************
<lifelines.CoxPHFitter: fitted with 252 observations, 150 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 252
number of events = 102
partial log-likelihood = -468.70
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.40 1.49 0.93 -1.42 2.23 0.24 9.28
z p -log2(p)
salary 0.43 0.67 0.59
---
Concordance = 0.51
Log-likelihood ratio test = 0.19 on 1 df, -log2(p)=0.59
company 5 engineer
********************
<lifelines.CoxPHFitter: fitted with 311 observations, 178 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 311
number of events = 133
partial log-likelihood = -647.13
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.22 0.30 0.55 -2.30 -0.13 0.10 0.87
z p -log2(p)
salary -2.20 0.03 5.18
---
Concordance = 0.52
Log-likelihood ratio test = 4.71 on 1 df, -log2(p)=5.06
company 5 marketing
********************
<lifelines.CoxPHFitter: fitted with 224 observations, 114 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 224
number of events = 110
partial log-likelihood = -497.49
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.05 1.05 0.81 -1.53 1.63 0.22 5.12
z p -log2(p)
salary 0.06 0.95 0.07
---
Concordance = 0.51
Log-likelihood ratio test = 0.00 on 1 df, -log2(p)=0.07
company 5 customer_service
********************
<lifelines.CoxPHFitter: fitted with 631 observations, 353 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 631
number of events = 278
partial log-likelihood = -1545.97
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.91 0.40 0.90 -2.68 0.86 0.07 2.37
z p -log2(p)
salary -1.01 0.31 1.67
---
Concordance = 0.51
Log-likelihood ratio test = 1.00 on 1 df, -log2(p)=1.66
company 5 data_science
********************
<lifelines.CoxPHFitter: fitted with 213 observations, 113 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 213
number of events = 100
partial log-likelihood = -440.81
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.50 0.61 0.66 -1.80 0.81 0.17 2.24
z p -log2(p)
salary -0.75 0.46 1.13
---
Concordance = 0.52
Log-likelihood ratio test = 0.55 on 1 df, -log2(p)=1.12
company 5 sales
********************
<lifelines.CoxPHFitter: fitted with 254 observations, 148 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 254
number of events = 106
partial log-likelihood = -483.49
time fit was run = 2019-07-19 12:59:19 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.12 0.89 0.88 -1.83 1.60 0.16 4.94
z p -log2(p)
salary -0.14 0.89 0.16
---
Concordance = 0.48
Log-likelihood ratio test = 0.02 on 1 df, -log2(p)=0.16
company 6 engineer
********************
<lifelines.CoxPHFitter: fitted with 218 observations, 115 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 218
number of events = 103
partial log-likelihood = -453.19
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.05 1.05 0.60 -1.12 1.22 0.33 3.40
z p -log2(p)
salary 0.09 0.93 0.10
---
Concordance = 0.51
Log-likelihood ratio test = 0.01 on 1 df, -log2(p)=0.10
company 6 marketing
********************
<lifelines.CoxPHFitter: fitted with 174 observations, 117 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 174
number of events = 57
partial log-likelihood = -252.45
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.20 1.22 1.17 -2.10 2.50 0.12 12.19
z p -log2(p)
salary 0.17 0.86 0.21
---
Concordance = 0.50
Log-likelihood ratio test = 0.03 on 1 df, -log2(p)=0.21
company 6 customer_service
********************
<lifelines.CoxPHFitter: fitted with 495 observations, 257 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 495
number of events = 238
partial log-likelihood = -1260.22
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.71 2.04 0.98 -1.20 2.63 0.30 13.86
z p -log2(p)
salary 0.73 0.46 1.10
---
Concordance = 0.53
Log-likelihood ratio test = 0.54 on 1 df, -log2(p)=1.11
company 6 data_science
********************
<lifelines.CoxPHFitter: fitted with 151 observations, 84 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 151
number of events = 67
partial log-likelihood = -272.89
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.13 0.32 0.76 -2.61 0.35 0.07 1.42
z p -log2(p)
salary -1.50 0.13 2.90
---
Concordance = 0.57
Log-likelihood ratio test = 2.18 on 1 df, -log2(p)=2.84
company 6 sales
********************
<lifelines.CoxPHFitter: fitted with 161 observations, 85 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 161
number of events = 76
partial log-likelihood = -320.96
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.22 1.24 1.06 -1.85 2.29 0.16 9.85
z p -log2(p)
salary 0.21 0.84 0.26
---
Concordance = 0.49
Log-likelihood ratio test = 0.04 on 1 df, -log2(p)=0.26
company 7 engineer
********************
<lifelines.CoxPHFitter: fitted with 223 observations, 123 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 223
number of events = 100
partial log-likelihood = -445.48
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.59 1.80 0.62 -0.63 1.81 0.53 6.09
z p -log2(p)
salary 0.94 0.35 1.53
---
Concordance = 0.54
Log-likelihood ratio test = 0.90 on 1 df, -log2(p)=1.55
company 7 marketing
********************
<lifelines.CoxPHFitter: fitted with 140 observations, 77 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 140
number of events = 63
partial log-likelihood = -244.08
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.09 1.09 1.13 -2.12 2.29 0.12 9.92
z p -log2(p)
salary 0.08 0.94 0.09
---
Concordance = 0.57
Log-likelihood ratio test = 0.01 on 1 df, -log2(p)=0.09
company 7 customer_service
********************
<lifelines.CoxPHFitter: fitted with 466 observations, 266 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 466
number of events = 200
partial log-likelihood = -1045.27
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.42 1.52 1.09 -1.72 2.57 0.18 13.00
z p -log2(p)
salary 0.39 0.70 0.51
---
Concordance = 0.51
Log-likelihood ratio test = 0.15 on 1 df, -log2(p)=0.52
company 7 data_science
********************
<lifelines.CoxPHFitter: fitted with 149 observations, 84 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 149
number of events = 65
partial log-likelihood = -270.28
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.71 0.49 0.73 -2.14 0.72 0.12 2.05
z p -log2(p)
salary -0.98 0.33 1.60
---
Concordance = 0.52
Log-likelihood ratio test = 0.94 on 1 df, -log2(p)=1.59
company 7 sales
********************
<lifelines.CoxPHFitter: fitted with 162 observations, 96 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 162
number of events = 66
partial log-likelihood = -270.22
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.57 0.57 1.07 -2.66 1.52 0.07 4.57
z p -log2(p)
salary -0.53 0.59 0.75
---
Concordance = 0.50
Log-likelihood ratio test = 0.28 on 1 df, -log2(p)=0.75
company 8 engineer
********************
<lifelines.CoxPHFitter: fitted with 190 observations, 102 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 190
number of events = 88
partial log-likelihood = -380.02
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 1.07 2.90 0.78 -0.47 2.60 0.63 13.48
z p -log2(p)
salary 1.36 0.17 2.53
---
Concordance = 0.53
Log-likelihood ratio test = 1.92 on 1 df, -log2(p)=2.59
company 8 marketing
********************
<lifelines.CoxPHFitter: fitted with 132 observations, 67 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 132
number of events = 65
partial log-likelihood = -249.54
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 2.23 9.28 1.16 -0.04 4.50 0.96 90.02
z p -log2(p)
salary 1.92 0.05 4.19
---
Concordance = 0.55
Log-likelihood ratio test = 3.74 on 1 df, -log2(p)=4.24
company 8 customer_service
********************
<lifelines.CoxPHFitter: fitted with 378 observations, 213 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 378
number of events = 165
partial log-likelihood = -833.59
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.68 1.96 1.20 -1.67 3.02 0.19 20.52
z p -log2(p)
salary 0.56 0.57 0.80
---
Concordance = 0.51
Log-likelihood ratio test = 0.32 on 1 df, -log2(p)=0.81
company 8 data_science
********************
<lifelines.CoxPHFitter: fitted with 143 observations, 78 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 143
number of events = 65
partial log-likelihood = -261.90
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.82 2.28 0.80 -0.75 2.40 0.47 11.01
z p -log2(p)
salary 1.02 0.31 1.70
---
Concordance = 0.53
Log-likelihood ratio test = 1.08 on 1 df, -log2(p)=1.74
company 8 sales
********************
<lifelines.CoxPHFitter: fitted with 136 observations, 84 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 136
number of events = 52
partial log-likelihood = -201.83
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.74 0.48 1.25 -3.19 1.70 0.04 5.50
z p -log2(p)
salary -0.60 0.55 0.86
---
Concordance = 0.49
Log-likelihood ratio test = 0.36 on 1 df, -log2(p)=0.86
company 9 engineer
********************
<lifelines.CoxPHFitter: fitted with 185 observations, 104 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 185
number of events = 81
partial log-likelihood = -357.23
time fit was run = 2019-07-19 12:59:20 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.89 0.41 0.65 -2.18 0.39 0.11 1.48
z p -log2(p)
salary -1.37 0.17 2.54
---
Concordance = 0.51
Log-likelihood ratio test = 1.83 on 1 df, -log2(p)=2.50
company 9 marketing
********************
<lifelines.CoxPHFitter: fitted with 124 observations, 62 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 124
number of events = 62
partial log-likelihood = -248.95
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.18 0.84 1.11 -2.35 2.00 0.10 7.35
z p -log2(p)
salary -0.16 0.87 0.19
---
Concordance = 0.50
Log-likelihood ratio test = 0.03 on 1 df, -log2(p)=0.19
company 9 customer_service
********************
<lifelines.CoxPHFitter: fitted with 341 observations, 186 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 341
number of events = 155
partial log-likelihood = -773.20
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.32 1.37 1.14 -1.91 2.54 0.15 12.72
z p -log2(p)
salary 0.28 0.78 0.36
---
Concordance = 0.50
Log-likelihood ratio test = 0.08 on 1 df, -log2(p)=0.36
company 9 data_science
********************
<lifelines.CoxPHFitter: fitted with 133 observations, 70 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 133
number of events = 63
partial log-likelihood = -253.94
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.80 2.22 0.73 -0.64 2.23 0.53 9.33
z p -log2(p)
salary 1.09 0.28 1.85
---
Concordance = 0.57
Log-likelihood ratio test = 1.21 on 1 df, -log2(p)=1.88
company 9 sales
********************
<lifelines.CoxPHFitter: fitted with 113 observations, 63 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 113
number of events = 50
partial log-likelihood = -193.26
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.57 1.76 1.22 -1.83 2.97 0.16 19.41
z p -log2(p)
salary 0.46 0.64 0.64
---
Concordance = 0.48
Log-likelihood ratio test = 0.22 on 1 df, -log2(p)=0.64
company 10 engineer
********************
<lifelines.CoxPHFitter: fitted with 170 observations, 93 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 170
number of events = 77
partial log-likelihood = -318.74
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.29 0.28 0.72 -2.71 0.13 0.07 1.13
z p -log2(p)
salary -1.79 0.07 3.75
---
Concordance = 0.55
Log-likelihood ratio test = 3.08 on 1 df, -log2(p)=3.66
company 10 marketing
********************
<lifelines.CoxPHFitter: fitted with 96 observations, 56 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 96
number of events = 40
partial log-likelihood = -151.44
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 1.02 2.77 1.50 -1.92 3.96 0.15 52.37
z p -log2(p)
salary 0.68 0.50 1.01
---
Concordance = 0.58
Log-likelihood ratio test = 0.47 on 1 df, -log2(p)=1.02
company 10 customer_service
********************
<lifelines.CoxPHFitter: fitted with 333 observations, 189 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 333
number of events = 144
partial log-likelihood = -713.10
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary 0.97 2.64 1.33 -1.64 3.58 0.19 35.85
z p -log2(p)
salary 0.73 0.47 1.10
---
Concordance = 0.54
Log-likelihood ratio test = 0.54 on 1 df, -log2(p)=1.11
company 10 data_science
********************
<lifelines.CoxPHFitter: fitted with 108 observations, 51 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 108
number of events = 57
partial log-likelihood = -219.88
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.09 0.91 0.71 -1.48 1.29 0.23 3.64
z p -log2(p)
salary -0.13 0.89 0.16
---
Concordance = 0.50
Log-likelihood ratio test = 0.02 on 1 df, -log2(p)=0.16
company 10 sales
********************
<lifelines.CoxPHFitter: fitted with 110 observations, 64 censored>
duration col = 'lasting_days'
event col = 'event'
number of subjects = 110
number of events = 46
partial log-likelihood = -171.71
time fit was run = 2019-07-19 12:59:21 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -2.27 0.10 1.31 -4.84 0.30 0.01 1.35
z p -log2(p)
salary -1.73 0.08 3.58
---
Concordance = 0.64
Log-likelihood ratio test = 2.93 on 1 df, -log2(p)=3.52