SurvivalAnalysis

NOTE: github's rendering in notebook doesn't always work, if the ipynb file doesn't load, you can see it in https://nbviewer.jupyter.org/github/YIZHE12/SurvivalAnalysis/blob/master/EDA_survival_analysis.ipynb

Background:

You belong to the people analytics team for a food conglomerate. Employee turnover has been rampant for your 10 subsidiaries. The CFO estimates that the cost of replacing an employee is often larger than 100K USD, taking into account the time spent to interview and find a replacement, placement fees, sign-on bonuses and the loss of productivity for several months.

Your team has been tasked with diagnosing why and when employees from your subsidiaries leave. You need a tangible data-driven recommendation for each of the ten Presidents of your subsidiaries. What are your recommendations and why?

Quick look:

This is a survival analysis tasks that I solved using Kaplan Meier plot and Cox Proportional-Hazards Model. There are some data cleaning to do as the datedata is several formats. There are also outliers in the data. For examples, two data points have seniority of 90 years which is not likely as we don't expect someone who have worked for 90 years. For more information, you can have a look at the pdf file in the repo.

The notebook here included all the analysis. The data is the txt file uploaded.

Prerequisites

pandas, numpy, lifelines, matplotlib, seaborn

pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install lifelines

The notebook:

import os
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib
import numpy as np

matplotlib.rcParams.update({'font.size': 20})

data = pd.read_csv('employee_retention.txt', index_col = 'Unnamed: 0')

data.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employee_id	company_id	dept	seniority	salary	join_date	quit_date
0	1001444.0	8	temp_contractor	0	5850.0	2008-01-26	2008-04-25
1	388804.0	8	design	21	191000.0	05.17.2011	2012-03-16
2	407990.0	3	design	9	90000.0	2012-03-26	2015-04-10
3	120657.0	2	engineer	20	298000.0	2013-04-08	2015-01-30
4	1006393.0	1	temp_contractor	0	8509.0	2008-07-20	2008-10-18

data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 34702 entries, 0 to 34701
Data columns (total 7 columns):
employee_id    34702 non-null float64
company_id     34702 non-null int64
dept           34702 non-null object
seniority      34702 non-null int64
salary         34463 non-null float64
join_date      34702 non-null object
quit_date      23510 non-null object
dtypes: float64(2), int64(2), object(3)
memory usage: 2.1+ MB

There are less data in the quit_date, this is because these people are not quite yet

Some salary data is missing

But maybe for the first survivol analysis we can do without

Notice the date has different formats: 05.17.2011 2012-03-16

Data cleaning

data.join_date = pd.to_datetime(data.join_date)
data.quit_date = pd.to_datetime(data.quit_date)

# data.quit_date = data.quit_date.fillna(value=datetime.date.today())

data.quit_date.max()

Timestamp('2015-12-09 00:00:00')

data.join_date.max()

Timestamp('2015-12-10 00:00:00')

data.employee_id = data.employee_id.astype('int32')

len(data.employee_id.unique()) # check if there is replicate

len(data.company_id.unique())

data.dept.unique()

array(['temp_contractor', 'design', 'engineer', 'marketing',
       'customer_service', 'data_science', 'sales'], dtype=object)

Does every company has the all these departments?

for i in range(len(data.company_id.unique())):
    print(i+1, data[data.company_id == i+1].dept.unique(), len(data[data.company_id == i+1].dept.unique()))

1 ['temp_contractor' 'customer_service' 'engineer' 'sales' 'data_science'
 'marketing' 'design'] 7
2 ['engineer' 'data_science' 'design' 'temp_contractor' 'sales'
 'customer_service' 'marketing'] 7
3 ['design' 'customer_service' 'data_science' 'sales' 'temp_contractor'
 'marketing' 'engineer'] 7
4 ['temp_contractor' 'data_science' 'marketing' 'customer_service'
 'engineer' 'design' 'sales'] 7
5 ['marketing' 'sales' 'temp_contractor' 'customer_service' 'data_science'
 'design' 'engineer'] 7
6 ['marketing' 'temp_contractor' 'engineer' 'design' 'customer_service'
 'data_science' 'sales'] 7
7 ['data_science' 'design' 'temp_contractor' 'customer_service' 'engineer'
 'marketing' 'sales'] 7
8 ['temp_contractor' 'design' 'customer_service' 'engineer' 'sales'
 'marketing' 'data_science'] 7
9 ['temp_contractor' 'customer_service' 'engineer' 'sales' 'data_science'
 'marketing' 'design'] 7
10 ['data_science' 'temp_contractor' 'marketing' 'customer_service'
 'engineer' 'sales' 'design'] 7
11 ['engineer' 'customer_service' 'marketing' 'data_science'] 4
12 ['data_science' 'engineer' 'customer_service' 'marketing' 'sales' 'design'] 6

How many examples each companies have, and for which department?

count_company = data.groupby('company_id').count()
count_company

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employee_id	dept	seniority	salary	join_date	quit_date
company_id
1	9501	9501	9501	9423	9501	5636
2	5220	5220	5220	5178	5220	3204
3	3773	3773	3773	3748	3773	2555
4	3066	3066	3066	3046	3066	2157
5	2749	2749	2749	2734	2749	1977
6	2258	2258	2258	2243	2258	1679
7	2185	2185	2185	2170	2185	1653
8	2026	2026	2026	2011	2026	1558
9	2005	2005	2005	1998	2005	1573
10	1879	1879	1879	1873	1879	1494
11	16	16	16	16	16	12
12	24	24	24	23	24	12

company 11 and 12 have too few data points!

count_department = data.groupby(['company_id', 'dept']).count()
count_department

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

		employee_id	seniority	salary	join_date	quit_date
company_id	dept
1	customer_service	3157	3157	3129	3157	1803
	data_science	1079	1079	1070	1079	565
	design	499	499	491	499	269
	engineer	1568	1568	1552	1568	748
	marketing	1085	1085	1075	1085	620
	sales	1098	1098	1091	1098	616
	temp_contractor	1015	1015	1015	1015	1015
2	customer_service	1548	1548	1530	1548	840
	data_science	568	568	562	568	269
	design	223	223	223	223	126
	engineer	829	829	822	829	384
	marketing	541	541	535	541	295
	sales	513	513	508	513	292
	temp_contractor	998	998	998	998	998
3	customer_service	1010	1010	1000	1010	545
	data_science	347	347	345	347	194
	design	141	141	141	141	81
	engineer	516	516	512	516	292
	marketing	372	372	367	372	214
	sales	363	363	359	363	205
	temp_contractor	1024	1024	1024	1024	1024
4	customer_service	777	777	769	777	415
	data_science	279	279	277	279	161
	design	107	107	106	107	61
	engineer	376	376	375	376	208
	marketing	269	269	263	269	157
	sales	254	254	252	254	151
	temp_contractor	1004	1004	1004	1004	1004
5	customer_service	635	635	631	635	355
5	data_science	216	216	213	216	114
...	...	...	...	...	...	...
8	data_science	146	146	143	146	80
	design	53	53	53	53	24
	engineer	191	191	190	191	103
	marketing	135	135	132	135	68
	sales	137	137	136	137	85
	temp_contractor	979	979	979	979	979
9	customer_service	342	342	341	342	186
	data_science	134	134	133	134	71
	design	60	60	58	60	41
	engineer	188	188	185	188	106
	marketing	124	124	124	124	62
	sales	113	113	113	113	63
	temp_contractor	1044	1044	1044	1044	1044
10	customer_service	336	336	333	336	190
	data_science	109	109	108	109	52
	design	41	41	41	41	23
	engineer	172	172	171	172	94
	marketing	96	96	96	96	56
	sales	111	111	110	111	65
	temp_contractor	1014	1014	1014	1014	1014
11	customer_service	6	6	6	6	3
	data_science	2	2	2	2	2
	engineer	6	6	6	6	5
	marketing	2	2	2	2	2
12	customer_service	12	12	11	12	7
	data_science	4	4	4	4	2
	design	1	1	1	1	0
	engineer	4	4	4	4	1
	marketing	1	1	1	1	0
	sales	2	2	2	2	2

80 rows × 5 columns

fig, ax = plt.subplots(figsize=(20,10))

count_department['employee_id'].unstack().plot(ax=ax, kind = 'bar')
plt.savefig('stat.png')

n_company = len(data.company_id.unique())
n_dept= len(data.dept.unique())

companies = data.company_id.unique()

plt.figure(figsize = (40, 30))
for i, company in enumerate(companies):
    plt.subplot(4,3,i+1)
    sns.distplot(data[data.company_id == company].salary.dropna(), norm_hist=False, kde=False, bins=20, \
                 hist_kws={"alpha": 1}).set(xlabel='Salary', ylabel='Count');
    plt.title('company '+str(i+1) + ' salary' )
#     sns.distplot(data[data.company_id == company].salary.dropna(),norm_hist=False)
plt.savefig('salary.png')

plt.figure(figsize = (40, 30))
for i, company in enumerate(companies):
    plt.subplot(4,3,i+1)
    sns.distplot(data[data.company_id == company].seniority.dropna(), norm_hist=False, kde=False, bins=20, \
                 hist_kws={"alpha": 1}).set(xlabel='seniority', ylabel='Count');
    plt.title('company '+str(i+1) + ' seniority' )
#     sns.distplot(data[data.company_id == company].salary.dropna(),norm_hist=False)
plt.savefig('seniority.png')

Create another copy of data to do numerical EDA

data_num = data.copy()

data_num.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employee_id	company_id	dept	seniority	salary	join_date	quit_date
0	1001444	8	temp_contractor	0	5850.0	2008-01-26	2008-04-25
1	388804	8	design	21	191000.0	2011-05-17	2012-03-16
2	407990	3	design	9	90000.0	2012-03-26	2015-04-10
3	120657	2	engineer	20	298000.0	2013-04-08	2015-01-30
4	1006393	1	temp_contractor	0	8509.0	2008-07-20	2008-10-18

# data_num.quit_date = data_num.quit_date.fillna(value=datetime.date.today())

data_num = data_num.dropna()

data_num.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23379 entries, 0 to 34701
Data columns (total 7 columns):
employee_id    23379 non-null int32
company_id     23379 non-null int64
dept           23379 non-null object
seniority      23379 non-null int64
salary         23379 non-null float64
join_date      23379 non-null datetime64[ns]
quit_date      23379 non-null datetime64[ns]
dtypes: datetime64[ns](2), float64(1), int32(1), int64(2), object(1)
memory usage: 1.3+ MB

data_num['lasting_data'] = data_num.quit_date - data_num.join_date
data_num['lasting_data'] = data_num['lasting_data'].dt.days
data_num.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employee_id	company_id	dept	seniority	salary	join_date	quit_date	lasting_data
0	1001444	8	temp_contractor	0	5850.0	2008-01-26	2008-04-25	90
1	388804	8	design	21	191000.0	2011-05-17	2012-03-16	304
2	407990	3	design	9	90000.0	2012-03-26	2015-04-10	1110
3	120657	2	engineer	20	298000.0	2013-04-08	2015-01-30	662
4	1006393	1	temp_contractor	0	8509.0	2008-07-20	2008-10-18	90

data_num.company_id = pd.Categorical(data_num.company_id)

data_num.company_id = data_num.company_id.astype('object')

data_num.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23379 entries, 0 to 34701
Data columns (total 8 columns):
employee_id     23379 non-null int32
company_id      23379 non-null object
dept            23379 non-null object
seniority       23379 non-null int64
salary          23379 non-null float64
join_date       23379 non-null datetime64[ns]
quit_date       23379 non-null datetime64[ns]
lasting_data    23379 non-null int64
dtypes: datetime64[ns](2), float64(1), int32(1), int64(2), object(2)
memory usage: 1.5+ MB

g = sns.pairplot(data_num[["company_id", "seniority", "salary", "lasting_data"]], \
                 hue = "company_id")

/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py:487: RuntimeWarning: invalid value encountered in true_divide
  binned = fast_linbin(X, a, b, gridsize) / (delta * nobs)
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/statsmodels/nonparametric/kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
  FAC1 = 2*(np.pi*bw/RANGE)**2
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/numpy/core/fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

g = sns.pairplot(data_num[["dept", "seniority", "salary", "lasting_data"]], \
                 hue = "dept")

Survival analysis

data['event'] = pd.isnull(data.quit_date).astype('int8') 
# if there is a quit day, it is not nan, flag 0 - quit

data.quit_date.max()

Timestamp('2015-12-09 00:00:00')

2015-12-10 is the maximum date in the file

data.quit_date = data.quit_date.fillna(value='2015-12-13')

data.quit_date = pd.to_datetime(data.quit_date)

data.head(50)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employee_id	company_id	dept	seniority	salary	join_date	quit_date	event
0	1001444	8	temp_contractor	0	5850.0	2008-01-26	2008-04-25	0
1	388804	8	design	21	191000.0	2011-05-17	2012-03-16	0
2	407990	3	design	9	90000.0	2012-03-26	2015-04-10	0
3	120657	2	engineer	20	298000.0	2013-04-08	2015-01-30	0
4	1006393	1	temp_contractor	0	8509.0	2008-07-20	2008-10-18	0
5	287530	5	marketing	20	180000.0	2014-06-30	2015-12-13	1
6	561043	3	customer_service	18	119000.0	2012-07-02	2014-03-28	0
7	702479	7	data_science	7	140000.0	2011-12-27	2013-08-30	0
8	545690	10	data_science	16	238000.0	2013-12-23	2015-12-13	1
9	622587	5	sales	28	166000.0	2015-07-01	2015-12-13	1
10	430126	2	data_science	3	77000.0	2015-08-03	2015-12-13	1
11	838072	3	data_science	13	162000.0	2011-10-03	2012-08-10	0
12	205557	8	customer_service	17	109000.0	2013-07-22	2014-07-18	0
13	554514	1	customer_service	4	33000.0	2013-04-15	2015-04-24	0
14	14751	7	design	18	162000.0	2012-04-30	2014-02-14	0
15	602443	3	sales	16	150000.0	2011-09-12	2013-07-19	0
16	488083	1	engineer	8	NaN	2011-06-13	2013-06-07	0
17	1007464	7	temp_contractor	0	7748.0	2009-11-14	2010-02-12	0
18	1002775	3	temp_contractor	0	7424.0	2008-01-14	2008-04-13	0
19	581423	6	marketing	1	35000.0	2012-01-09	2015-06-12	0
20	1000103	5	temp_contractor	0	9684.0	2008-05-18	2008-08-16	0
21	34604	2	design	29	224000.0	2015-09-08	2015-12-13	1
22	1008116	4	temp_contractor	0	9865.0	2010-10-03	2011-01-01	0
23	182278	1	sales	19	179000.0	2011-09-19	2012-11-02	0
24	1003092	2	temp_contractor	0	5459.0	2009-09-23	2009-12-22	0
25	296069	2	engineer	16	308000.0	2012-01-03	2015-12-13	1
26	1007778	7	temp_contractor	0	6749.0	2007-02-14	2007-05-15	0
27	612255	7	customer_service	6	66000.0	2015-03-23	2015-12-13	1
28	28269	2	sales	9	153000.0	2011-08-29	2012-08-03	0
29	904543	2	data_science	17	314000.0	2013-11-25	2015-12-13	1
30	289336	3	design	6	111000.0	2012-12-24	2015-06-26	0
31	591606	1	customer_service	22	123000.0	2015-08-24	2015-12-13	1
32	505031	8	engineer	15	229000.0	2013-09-09	2015-12-13	1
33	1006601	8	temp_contractor	0	9051.0	2008-03-11	2008-06-09	0
34	855236	2	engineer	19	309000.0	2012-01-17	2015-12-13	1
35	543068	5	sales	1	42000.0	2012-08-14	2013-08-02	0
36	282308	4	data_science	14	130000.0	2014-04-11	2015-12-13	1
37	1005290	2	temp_contractor	0	9723.0	2008-09-19	2008-12-18	0
38	643275	3	customer_service	3	21000.0	2011-06-13	2012-06-08	0
39	115980	3	marketing	6	100000.0	2012-04-09	2013-04-26	0
40	1006290	4	temp_contractor	0	9512.0	2008-10-13	2009-01-11	0
41	259298	1	engineer	9	NaN	2011-11-07	2015-10-16	0
42	1007928	10	temp_contractor	0	6538.0	2009-12-28	2010-03-28	0
43	13088	2	customer_service	4	34000.0	2015-09-21	2015-12-13	1
44	1004117	10	temp_contractor	0	8052.0	2007-05-17	2007-08-15	0
45	1002404	10	temp_contractor	0	7998.0	2009-09-13	2009-12-12	0
46	975096	1	customer_service	16	125000.0	2015-04-27	2015-12-13	1
47	432323	7	engineer	20	236000.0	2013-12-02	2015-12-13	1
48	921758	2	engineer	9	216000.0	2014-04-08	2015-01-02	0
49	301501	7	customer_service	29	93000.0	2014-05-27	2015-04-10	0

data['lasting_days'] = data.quit_date - data.join_date
data['lasting_days'] = data['lasting_days'].dt.days

from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()

matplotlib.rcParams.update({'font.size': 10})

fig, axes = plt.subplots(1, 1, figsize=(9, 5))

## Fit the data into the model
kmf.fit(data['lasting_days'] , data['event'], label='Kaplan Meier Estimate')
## Create an estimate
kmf.plot(ci_show=False, ax=axes) 
plt.legend(loc='lower left')
kmf.median_
## ci_show is meant for Confidence interval, since our data set is too tiny, thus i am not showing it.

909.0

department_type = data['dept'].unique()
matplotlib.rcParams.update({'font.size': 25})
fig, axes = plt.subplots(1, 1, figsize=(20, 20))
for i, de in enumerate(department_type): 
    
    i1 = (data.dept == de)      ## group i1 , having the pandas series  for the 1st cohort
    ## fit the model for 1st cohort
    kmf.fit(data['lasting_days'][i1] , data['event'][i1], label=de)   
    print(de, ':', kmf.median_)
    kmf.plot(ci_show=False, ax=axes)
    
    plt.yscale('log')
    
plt.savefig('dept_survival.png')

temp_contractor : inf
design : 909.0
engineer : 888.0
marketing : 909.0
customer_service : 888.0
data_science : 902.0
sales : 895.0

fig, ax = plt.subplots(1, 1, figsize=(20, 20))
kmf = KaplanMeierFitter()

for name, grouped_df in data.groupby(['company_id']):
    kmf.fit(grouped_df["lasting_days"], grouped_df["event"], label=name)
    print(name, ':', kmf.median_)
    kmf.plot(ax=ax)
    
plt.savefig('company_survival.png')

1 : 895.0
2 : 902.0
3 : 916.0
4 : 902.0
5 : 923.0
6 : 923.0
7 : 923.0
8 : 929.0
9 : 916.0
10 : 937.0
11 : 1217.0
12 : 726.0

fig, ax = plt.subplots(1, 1, figsize=(20, 20))
kmf = KaplanMeierFitter()

for name, grouped_df in data.groupby(['company_id', 'dept']):
    kmf.fit(grouped_df["lasting_days"], grouped_df["event"], label=name)
    print(name, ':', kmf.median_)
    kmf.plot(ci_show=False, ax=ax)

(1, 'customer_service') : 916.0
(1, 'data_science') : 902.0
(1, 'design') : 881.0
(1, 'engineer') : 846.0
(1, 'marketing') : 929.0
(1, 'sales') : 853.0
(1, 'temp_contractor') : inf
(2, 'customer_service') : 891.0
(2, 'data_science') : 895.0
(2, 'design') : 923.0
(2, 'engineer') : 864.0
(2, 'marketing') : 937.0
(2, 'sales') : 888.0
(2, 'temp_contractor') : inf
(3, 'customer_service') : 888.0
(3, 'data_science') : 825.0
(3, 'design') : 1000.0
(3, 'engineer') : 937.0
(3, 'marketing') : 951.0
(3, 'sales') : 865.0
(3, 'temp_contractor') : inf
(4, 'customer_service') : 825.0
(4, 'data_science') : 1007.0
(4, 'design') : 811.0
(4, 'engineer') : 937.0
(4, 'marketing') : 1014.0
(4, 'sales') : 867.0
(4, 'temp_contractor') : inf
(5, 'customer_service') : 853.0
(5, 'data_science') : 923.0
(5, 'design') : 1027.0
(5, 'engineer') : 1021.0
(5, 'marketing') : 727.0
(5, 'sales') : 929.0
(5, 'temp_contractor') : inf
(6, 'customer_service') : 795.0
(6, 'data_science') : 990.0
(6, 'design') : 1055.0
(6, 'engineer') : 881.0
(6, 'marketing') : 1280.0
(6, 'sales') : 909.0
(6, 'temp_contractor') : inf
(7, 'customer_service') : 923.0
(7, 'data_science') : 888.0
(7, 'design') : 816.0
(7, 'engineer') : 909.0
(7, 'marketing') : 923.0
(7, 'sales') : 1021.0
(7, 'temp_contractor') : inf
(8, 'customer_service') : 902.0
(8, 'data_science') : 923.0
(8, 'design') : 691.0
(8, 'engineer') : 950.0
(8, 'marketing') : 846.0
(8, 'sales') : 965.0
(8, 'temp_contractor') : inf
(9, 'customer_service') : 923.0
(9, 'data_science') : 825.0
(9, 'design') : 909.0
(9, 'engineer') : 993.0
(9, 'marketing') : 846.0
(9, 'sales') : 759.0
(9, 'temp_contractor') : inf
(10, 'customer_service') : 902.0
(10, 'data_science') : 797.0
(10, 'design') : 956.0
(10, 'engineer') : 991.0
(10, 'marketing') : 1007.0
(10, 'sales') : 951.0
(10, 'temp_contractor') : inf
(11, 'customer_service') : 587.0
(11, 'data_science') : inf
(11, 'engineer') : 1217.0
(11, 'marketing') : inf
(12, 'customer_service') : 1014.0
(12, 'data_science') : 881.0
(12, 'design') : 699.0
(12, 'engineer') : 699.0
(12, 'marketing') : 726.0
(12, 'sales') : inf

from lifelines import CoxPHFitter

data.company_id = data.company_id.astype('category')
# data.seniority = data.seniority.astype('category')

index = np.where(data.seniority>50)

data = data.drop(data.index[index])

data.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employee_id	company_id	dept	seniority	salary	join_date	quit_date	lasting_days
0	1001444	8	temp_contractor	0	5850.0	2008-01-26	2008-04-25	90
1	388804	8	design	21	191000.0	2011-05-17	2012-03-16	304
2	407990	3	design	9	90000.0	2012-03-26	2015-04-10	1110
3	120657	2	engineer	20	298000.0	2013-04-08	2015-01-30	662
4	1006393	1	temp_contractor	0	8509.0	2008-07-20	2008-10-18	90

from sklearn.preprocessing import MinMaxScaler

## Create dummy variables
# df_dummy = pd.get_dummies(data[['event','lasting_days','seniority','salary','company_id']], drop_first=True)
df_dummy = data[['event','lasting_days','seniority','salary','company_id','dept']]
df_dummy = df_dummy.dropna()
max_s = np.max(df_dummy.salary.values)
min_s = np.min(df_dummy.salary.values)
df_dummy.salary = (df_dummy.salary.values - min_s)/(max_s-min_s)

max_se = np.max(df_dummy.seniority.values)
min_se = np.min(df_dummy.seniority.values)
df_dummy.seniority = (df_dummy.seniority.values - min_se)/(max_se-min_se)


df_dummy.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	lasting_days	seniority	salary	company_id	dept
0	90	0.000000	0.002109	8	temp_contractor
1	304	0.724138	0.461538	8	design
2	1110	0.310345	0.210918	3	design
3	662	0.689655	0.727047	2	engineer
4	90	0.000000	0.008707	1	temp_contractor

# Using Cox Proportional Hazards model
for i in range(12):
    data_test = df_dummy[df_dummy.company_id == i+1]
    data_test = data_test.drop(columns = ['company_id','dept'])   
    cph = CoxPHFitter()   ## Instantiate the class to create a cph object
    cph.fit(data_test, 'lasting_days', event_col='event')   ## Fit the data to train the model
    cph.print_summary()    ## HAve a look at the significance of the features

<lifelines.CoxPHFitter: fitted with 9422 observations, 5596 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 9422
  number of events = 3826
partial log-likelihood = -31068.96
  time fit was run = 2019-07-19 12:58:06 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.02      0.98      0.07           -0.16            0.13                0.86                1.13
salary     0.12      1.13      0.09           -0.06            0.31                0.94                1.36

              z    p  -log2(p)
seniority -0.21 0.83      0.27
salary     1.31 0.19      2.39
---
Concordance = 0.51
Log-likelihood ratio test = 2.27 on 2 df, -log2(p)=1.64
<lifelines.CoxPHFitter: fitted with 5178 observations, 3179 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 5178
  number of events = 1999
partial log-likelihood = -14881.37
  time fit was run = 2019-07-19 12:58:07 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.19      1.21      0.10           -0.01            0.39                0.99                1.47
salary     0.04      1.04      0.13           -0.21            0.29                0.81                1.34

             z    p  -log2(p)
seniority 1.89 0.06      4.08
salary    0.32 0.75      0.42
---
Concordance = 0.53
Log-likelihood ratio test = 7.12 on 2 df, -log2(p)=5.13
<lifelines.CoxPHFitter: fitted with 3748 observations, 2543 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3748
  number of events = 1205
partial log-likelihood = -8498.81
  time fit was run = 2019-07-19 12:58:07 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.19      1.21      0.13           -0.07            0.45                0.93                1.57
salary    -0.07      0.93      0.22           -0.49            0.36                0.61                1.43

              z    p  -log2(p)
seniority  1.44 0.15      2.73
salary    -0.32 0.75      0.41
---
Concordance = 0.54
Log-likelihood ratio test = 2.64 on 2 df, -log2(p)=1.91
<lifelines.CoxPHFitter: fitted with 3046 observations, 2146 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3046
  number of events = 900
partial log-likelihood = -6067.60
  time fit was run = 2019-07-19 12:58:07 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.51      1.67      0.15            0.22            0.81                1.25                2.25
salary    -0.29      0.75      0.25           -0.77            0.19                0.46                1.21

              z      p  -log2(p)
seniority  3.43 <0.005     10.67
salary    -1.18   0.24      2.07
---
Concordance = 0.54
Log-likelihood ratio test = 12.81 on 2 df, -log2(p)=9.24
<lifelines.CoxPHFitter: fitted with 2734 observations, 1970 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 2734
  number of events = 764
partial log-likelihood = -5033.96
  time fit was run = 2019-07-19 12:58:07 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.14      1.15      0.17           -0.18            0.47                0.83                1.60
salary    -0.26      0.77      0.28           -0.81            0.28                0.45                1.33

              z    p  -log2(p)
seniority  0.86 0.39      1.35
salary    -0.94 0.35      1.53
---
Concordance = 0.51
Log-likelihood ratio test = 0.99 on 2 df, -log2(p)=0.71
<lifelines.CoxPHFitter: fitted with 2243 observations, 1670 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 2243
  number of events = 573
partial log-likelihood = -3607.36
  time fit was run = 2019-07-19 12:58:08 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.26      1.30      0.18           -0.09            0.62                0.91                1.85
salary    -0.04      0.96      0.30           -0.63            0.55                0.53                1.74

              z    p  -log2(p)
seniority  1.46 0.14      2.80
salary    -0.12 0.90      0.15
---
Concordance = 0.58
Log-likelihood ratio test = 3.08 on 2 df, -log2(p)=2.22
<lifelines.CoxPHFitter: fitted with 2170 observations, 1644 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 2170
  number of events = 526
partial log-likelihood = -3264.59
  time fit was run = 2019-07-19 12:58:08 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.32      1.37      0.20           -0.07            0.70                0.93                2.02
salary     0.18      1.19      0.32           -0.44            0.80                0.64                2.22

             z    p  -log2(p)
seniority 1.60 0.11      3.20
salary    0.56 0.58      0.79
---
Concordance = 0.58
Log-likelihood ratio test = 6.29 on 2 df, -log2(p)=4.54
<lifelines.CoxPHFitter: fitted with 2011 observations, 1547 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 2011
  number of events = 464
partial log-likelihood = -2813.20
  time fit was run = 2019-07-19 12:58:08 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.32      1.38      0.21           -0.09            0.73                0.92                2.07
salary     0.33      1.39      0.34           -0.34            1.00                0.71                2.71

             z    p  -log2(p)
seniority 1.54 0.12      3.03
salary    0.96 0.34      1.57
---
Concordance = 0.58
Log-likelihood ratio test = 8.36 on 2 df, -log2(p)=6.03
<lifelines.CoxPHFitter: fitted with 1998 observations, 1568 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1998
  number of events = 430
partial log-likelihood = -2617.25
  time fit was run = 2019-07-19 12:58:08 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.14      1.15      0.22           -0.29            0.57                0.75                1.77
salary     0.47      1.61      0.35           -0.22            1.17                0.80                3.21

             z    p  -log2(p)
seniority 0.63 0.53      0.92
salary    1.34 0.18      2.48
---
Concordance = 0.60
Log-likelihood ratio test = 5.85 on 2 df, -log2(p)=4.22
<lifelines.CoxPHFitter: fitted with 1872 observations, 1490 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1872
  number of events = 382
partial log-likelihood = -2269.34
  time fit was run = 2019-07-19 12:58:08 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.31      1.36      0.24           -0.17            0.78                0.85                2.18
salary    -0.01      0.99      0.39           -0.77            0.75                0.46                2.12

              z    p  -log2(p)
seniority  1.28 0.20      2.31
salary    -0.02 0.98      0.02
---
Concordance = 0.59
Log-likelihood ratio test = 2.74 on 2 df, -log2(p)=1.97
<lifelines.CoxPHFitter: fitted with 16 observations, 12 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 16
  number of events = 4
partial log-likelihood = -0.00
  time fit was run = 2019-07-19 12:58:09 UTC

---
             coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority   86.63  4.20e+37    659.33        -1205.64         1378.90                0.00                 inf
salary    -257.87      0.00   1823.05        -3830.98         3315.24                0.00                 inf

              z    p  -log2(p)
seniority  0.13 0.90      0.16
salary    -0.14 0.89      0.17
---
Concordance = 1.00
Log-likelihood ratio test = 10.48 on 2 df, -log2(p)=7.56
<lifelines.CoxPHFitter: fitted with 23 observations, 12 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 23
  number of events = 11
partial log-likelihood = -20.07
  time fit was run = 2019-07-19 12:58:09 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -2.03      0.13      3.29           -8.48            4.42                0.00               82.88
salary     3.75     42.71      6.65           -9.28           16.79                0.00            1.96e+07

              z    p  -log2(p)
seniority -0.62 0.54      0.90
salary     0.56 0.57      0.80
---
Concordance = 0.55
Log-likelihood ratio test = 0.39 on 2 df, -log2(p)=0.28


/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.7/site-packages/lifelines/fitters/coxph_fitter.py:561: ConvergenceWarning: Newton-Rhapson failed to converge sufficiently in 50 steps.
  warnings.warn("Newton-Rhapson failed to converge sufficiently in %d steps." % max_steps, ConvergenceWarning)

depts = data.dept.unique()

depts

array(['temp_contractor', 'design', 'engineer', 'marketing',
       'customer_service', 'data_science', 'sales'], dtype=object)

data_test = df_dummy[df_dummy.dept == 'temp_contractor']
data_test = data_test.drop(columns = ['company_id','dept'])

data_test.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	lasting_days	salary
0	90	0.002109
4	90	0.008707
17	90	0.006819
18	90	0.006015
20	90	0.011623

# Using Cox Proportional Hazards model
for i,de in enumerate(depts):
    if i >1:
        data_test = df_dummy[df_dummy.dept == de]
        data_test = data_test.drop(columns = ['company_id','dept'])   
        cph = CoxPHFitter()   ## Instantiate the class to create a cph object
        cph.fit(data_test, 'lasting_days', event_col='event')   ## Fit the data to train the model
        print(de)
        print('*'*20)
        cph.print_summary()    ## HAve a look at the significance of the features

engineer
********************
<lifelines.CoxPHFitter: fitted with 4568 observations, 2338 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 4568
  number of events = 2230
partial log-likelihood = -16747.83
  time fit was run = 2019-07-19 01:23:11 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.10      0.91      0.13           -0.35            0.15                0.71                1.16
salary     0.03      1.03      0.18           -0.32            0.38                0.72                1.46

              z    p  -log2(p)
seniority -0.77 0.44      1.17
salary     0.16 0.87      0.19
---
Concordance = 0.51
Log-likelihood ratio test = 1.15 on 2 df, -log2(p)=0.83
marketing
********************
<lifelines.CoxPHFitter: fitted with 3132 observations, 1765 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3132
  number of events = 1367
partial log-likelihood = -9762.54
  time fit was run = 2019-07-19 01:23:11 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.03      1.03      0.17           -0.30            0.37                0.74                1.44
salary    -0.17      0.84      0.36           -0.87            0.53                0.42                1.70

              z    p  -log2(p)
seniority  0.18 0.85      0.23
salary    -0.48 0.63      0.67
---
Concordance = 0.51
Log-likelihood ratio test = 0.38 on 2 df, -log2(p)=0.27
customer_service
********************
<lifelines.CoxPHFitter: fitted with 9089 observations, 5043 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 9089
  number of events = 4046
partial log-likelihood = -33184.68
  time fit was run = 2019-07-19 01:23:12 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.20      1.23      0.10            0.01            0.40                1.01                1.49
salary    -0.77      0.46      0.35           -1.46           -0.09                0.23                0.92

              z    p  -log2(p)
seniority  2.07 0.04      4.71
salary    -2.21 0.03      5.22
---
Concordance = 0.51
Log-likelihood ratio test = 5.11 on 2 df, -log2(p)=3.68
data_science
********************
<lifelines.CoxPHFitter: fitted with 3157 observations, 1663 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3157
  number of events = 1494
partial log-likelihood = -10679.77
  time fit was run = 2019-07-19 01:23:12 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.03      1.03      0.16           -0.28            0.33                0.75                1.40
salary     0.06      1.06      0.22           -0.37            0.49                0.69                1.63

             z    p  -log2(p)
seniority 0.17 0.87      0.21
salary    0.26 0.79      0.34
---
Concordance = 0.51
Log-likelihood ratio test = 0.47 on 2 df, -log2(p)=0.34
sales
********************
<lifelines.CoxPHFitter: fitted with 3148 observations, 1798 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3148
  number of events = 1350
partial log-likelihood = -9595.81
  time fit was run = 2019-07-19 01:23:13 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.16      0.85      0.17           -0.49            0.17                0.61                1.19
salary     0.54      1.71      0.35           -0.16            1.23                0.85                3.43

              z    p  -log2(p)
seniority -0.95 0.34      1.54
salary     1.51 0.13      2.94
---
Concordance = 0.52
Log-likelihood ratio test = 2.55 on 2 df, -log2(p)=1.84

# Using Cox Proportional Hazards model
for i in range(10):
    df_dummy2 = df_dummy[df_dummy.company_id == i+1]
    df_dummy2 = df_dummy2.drop(columns = ['company_id'])   
    
# Using Cox Proportional Hazards model
    for j,de in enumerate(depts):
        if j >1:
            data_test = df_dummy2[df_dummy2.dept == de]
            data_test = data_test.drop(columns = ['dept'])   
            cph = CoxPHFitter()   ## Instantiate the class to create a cph object
            cph.fit(data_test, 'lasting_days', event_col='event')   ## Fit the data to train the model
            print("company", i+1, de)
            print('*'*20)
            cph.print_summary()    ## HAve a look at the significance of the features

company 1 engineer
********************
<lifelines.CoxPHFitter: fitted with 1552 observations, 738 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1552
  number of events = 814
partial log-likelihood = -5205.78
  time fit was run = 2019-07-19 01:24:34 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.29      1.34      0.24           -0.18            0.76                0.84                2.14
salary    -0.53      0.59      0.32           -1.15            0.10                0.32                1.10

              z    p  -log2(p)
seniority  1.22 0.22      2.18
salary    -1.66 0.10      3.35
---
Concordance = 0.53
Log-likelihood ratio test = 2.89 on 2 df, -log2(p)=2.08
company 1 marketing
********************
<lifelines.CoxPHFitter: fitted with 1074 observations, 613 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1074
  number of events = 461
partial log-likelihood = -2791.80
  time fit was run = 2019-07-19 01:24:34 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.13      0.88      0.31           -0.74            0.47                0.48                1.61
salary     0.14      1.15      0.63           -1.10            1.38                0.33                3.97

              z    p  -log2(p)
seniority -0.43 0.67      0.58
salary     0.22 0.83      0.27
---
Concordance = 0.51
Log-likelihood ratio test = 0.26 on 2 df, -log2(p)=0.19
company 1 customer_service
********************
<lifelines.CoxPHFitter: fitted with 3129 observations, 1791 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3129
  number of events = 1338
partial log-likelihood = -9540.69
  time fit was run = 2019-07-19 01:24:34 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.00      1.00      0.19           -0.38            0.38                0.69                1.46
salary    -0.21      0.81      0.64           -1.46            1.05                0.23                2.86

              z    p  -log2(p)
seniority -0.00 1.00      0.00
salary    -0.32 0.75      0.42
---
Concordance = 0.51
Log-likelihood ratio test = 0.40 on 2 df, -log2(p)=0.29
company 1 data_science
********************
<lifelines.CoxPHFitter: fitted with 1070 observations, 562 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1070
  number of events = 508
partial log-likelihood = -3089.21
  time fit was run = 2019-07-19 01:24:34 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.36      1.44      0.29           -0.21            0.93                0.81                2.54
salary    -0.01      0.99      0.41           -0.81            0.79                0.44                2.19

              z    p  -log2(p)
seniority  1.25 0.21      2.25
salary    -0.03 0.97      0.04
---
Concordance = 0.54
Log-likelihood ratio test = 5.10 on 2 df, -log2(p)=3.68
company 1 sales
********************
<lifelines.CoxPHFitter: fitted with 1091 observations, 612 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1091
  number of events = 479
partial log-likelihood = -2914.55
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.69      0.50      0.32           -1.32           -0.05                0.27                0.95
salary     1.20      3.33      0.63           -0.04            2.45                0.96               11.53

              z    p  -log2(p)
seniority -2.12 0.03      4.87
salary     1.90 0.06      4.12
---
Concordance = 0.53
Log-likelihood ratio test = 4.54 on 2 df, -log2(p)=3.28
company 2 engineer
********************
<lifelines.CoxPHFitter: fitted with 822 observations, 380 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 822
  number of events = 442
partial log-likelihood = -2599.58
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.02      0.98      0.31           -0.62            0.58                0.54                1.79
salary     0.11      1.12      0.41           -0.70            0.92                0.50                2.52

              z    p  -log2(p)
seniority -0.06 0.95      0.07
salary     0.27 0.79      0.34
---
Concordance = 0.51
Log-likelihood ratio test = 0.16 on 2 df, -log2(p)=0.12
company 2 marketing
********************
<lifelines.CoxPHFitter: fitted with 535 observations, 291 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 535
  number of events = 244
partial log-likelihood = -1324.46
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.15      1.16      0.47           -0.77            1.06                0.46                2.90
salary    -0.52      0.59      0.92           -2.32            1.28                0.10                3.60

              z    p  -log2(p)
seniority  0.31 0.76      0.41
salary    -0.57 0.57      0.81
---
Concordance = 0.51
Log-likelihood ratio test = 0.45 on 2 df, -log2(p)=0.32
company 2 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1530 observations, 828 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1530
  number of events = 702
partial log-likelihood = -4467.66
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.23      1.26      0.25           -0.26            0.73                0.77                2.07
salary    -0.91      0.40      0.85           -2.58            0.77                0.08                2.15

              z    p  -log2(p)
seniority  0.92 0.36      1.48
salary    -1.06 0.29      1.80
---
Concordance = 0.51
Log-likelihood ratio test = 1.13 on 2 df, -log2(p)=0.82
company 2 data_science
********************
<lifelines.CoxPHFitter: fitted with 562 observations, 265 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 562
  number of events = 297
partial log-likelihood = -1617.21
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.40      1.49      0.41           -0.41            1.21                0.66                3.34
salary    -0.55      0.58      0.57           -1.66            0.57                0.19                1.76

              z    p  -log2(p)
seniority  0.97 0.33      1.58
salary    -0.96 0.34      1.57
---
Concordance = 0.50
Log-likelihood ratio test = 1.00 on 2 df, -log2(p)=0.72
company 2 sales
********************
<lifelines.CoxPHFitter: fitted with 508 observations, 291 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 508
  number of events = 217
partial log-likelihood = -1144.75
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.11      0.89      0.42           -0.94            0.72                0.39                2.05
salary     0.66      1.93      0.90           -1.10            2.41                0.33               11.16

              z    p  -log2(p)
seniority -0.27 0.79      0.34
salary     0.73 0.46      1.11
---
Concordance = 0.52
Log-likelihood ratio test = 0.95 on 2 df, -log2(p)=0.69
company 3 engineer
********************
<lifelines.CoxPHFitter: fitted with 512 observations, 291 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 512
  number of events = 221
partial log-likelihood = -1176.37
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.37      1.45      0.48           -0.57            1.31                0.57                3.72
salary    -1.21      0.30      0.81           -2.79            0.37                0.06                1.45

              z    p  -log2(p)
seniority  0.78 0.44      1.20
salary    -1.50 0.13      2.90
---
Concordance = 0.53
Log-likelihood ratio test = 3.29 on 2 df, -log2(p)=2.37
company 3 marketing
********************
<lifelines.CoxPHFitter: fitted with 367 observations, 213 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 367
  number of events = 154
partial log-likelihood = -771.02
  time fit was run = 2019-07-19 01:24:35 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.36      0.70      0.59           -1.51            0.79                0.22                2.20
salary     0.26      1.29      1.42           -2.53            3.04                0.08               20.95

              z    p  -log2(p)
seniority -0.62 0.54      0.89
salary     0.18 0.86      0.22
---
Concordance = 0.51
Log-likelihood ratio test = 0.95 on 2 df, -log2(p)=0.69
company 3 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1000 observations, 538 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1000
  number of events = 462
partial log-likelihood = -2798.21
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.56      1.75      0.34           -0.11            1.23                0.90                3.41
salary    -3.27      0.04      1.38           -5.98           -0.56                0.00                0.57

              z    p  -log2(p)
seniority  1.65 0.10      3.33
salary    -2.37 0.02      5.79
---
Concordance = 0.52
Log-likelihood ratio test = 6.34 on 2 df, -log2(p)=4.57
company 3 data_science
********************
<lifelines.CoxPHFitter: fitted with 345 observations, 193 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 345
  number of events = 152
partial log-likelihood = -756.00
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.18      0.83      0.57           -1.31            0.94                0.27                2.56
salary     0.48      1.62      0.94           -1.36            2.33                0.26               10.27

              z    p  -log2(p)
seniority -0.32 0.75      0.42
salary     0.51 0.61      0.72
---
Concordance = 0.48
Log-likelihood ratio test = 0.30 on 2 df, -log2(p)=0.22
company 3 sales
********************
<lifelines.CoxPHFitter: fitted with 359 observations, 203 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 359
  number of events = 156
partial log-likelihood = -762.34
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.91      2.50      0.54           -0.15            1.98                0.86                7.25
salary    -0.42      0.66      1.33           -3.03            2.19                0.05                8.92

              z    p  -log2(p)
seniority  1.68 0.09      3.44
salary    -0.32 0.75      0.41
---
Concordance = 0.56
Log-likelihood ratio test = 6.86 on 2 df, -log2(p)=4.95
company 4 engineer
********************
<lifelines.CoxPHFitter: fitted with 375 observations, 208 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 375
  number of events = 167
partial log-likelihood = -845.44
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.29      1.34      0.50           -0.70            1.28                0.50                3.59
salary    -0.27      0.76      0.90           -2.04            1.49                0.13                4.46

              z    p  -log2(p)
seniority  0.58 0.56      0.82
salary    -0.30 0.76      0.39
---
Concordance = 0.50
Log-likelihood ratio test = 0.42 on 2 df, -log2(p)=0.30
company 4 marketing
********************
<lifelines.CoxPHFitter: fitted with 263 observations, 153 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 263
  number of events = 110
partial log-likelihood = -513.20
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.99      2.69      0.67           -0.31            2.29                0.73                9.92
salary    -2.55      0.08      1.70           -5.89            0.79                0.00                2.21

              z    p  -log2(p)
seniority  1.49 0.14      2.88
salary    -1.50 0.13      2.89
---
Concordance = 0.54
Log-likelihood ratio test = 2.42 on 2 df, -log2(p)=1.75
company 4 customer_service
********************
<lifelines.CoxPHFitter: fitted with 769 observations, 412 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 769
  number of events = 357
partial log-likelihood = -2049.08
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.70      2.02      0.35            0.01            1.40                1.01                4.04
salary    -1.75      0.17      1.48           -4.65            1.15                0.01                3.15

              z    p  -log2(p)
seniority  1.99 0.05      4.41
salary    -1.18 0.24      2.08
---
Concordance = 0.51
Log-likelihood ratio test = 4.57 on 2 df, -log2(p)=3.30
company 4 data_science
********************
<lifelines.CoxPHFitter: fitted with 277 observations, 159 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 277
  number of events = 118
partial log-likelihood = -546.75
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.24      1.27      0.67           -1.06            1.55                0.34                4.70
salary    -0.08      0.93      1.12           -2.27            2.11                0.10                8.26

              z    p  -log2(p)
seniority  0.36 0.72      0.48
salary    -0.07 0.95      0.08
---
Concordance = 0.50
Log-likelihood ratio test = 0.31 on 2 df, -log2(p)=0.22
company 4 sales
********************
<lifelines.CoxPHFitter: fitted with 252 observations, 150 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 252
  number of events = 102
partial log-likelihood = -467.89
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.87      0.42      0.69           -2.22            0.47                0.11                1.61
salary     2.31     10.04      1.76           -1.14            5.75                0.32              314.44

              z    p  -log2(p)
seniority -1.27 0.20      2.29
salary     1.31 0.19      2.40
---
Concordance = 0.54
Log-likelihood ratio test = 1.80 on 2 df, -log2(p)=1.30
company 5 engineer
********************
<lifelines.CoxPHFitter: fitted with 311 observations, 178 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 311
  number of events = 133
partial log-likelihood = -645.66
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  1.04      2.84      0.61           -0.15            2.23                0.86                9.34
salary    -2.80      0.06      1.08           -4.92           -0.67                0.01                0.51

              z    p  -log2(p)
seniority  1.72 0.09      3.55
salary    -2.58 0.01      6.67
---
Concordance = 0.55
Log-likelihood ratio test = 7.65 on 2 df, -log2(p)=5.52
company 5 marketing
********************
<lifelines.CoxPHFitter: fitted with 224 observations, 114 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 224
  number of events = 110
partial log-likelihood = -496.74
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.87      0.42      0.72           -2.27            0.53                0.10                1.71
salary     1.84      6.29      1.66           -1.42            5.10                0.24              163.90

              z    p  -log2(p)
seniority -1.21 0.23      2.15
salary     1.11 0.27      1.89
---
Concordance = 0.54
Log-likelihood ratio test = 1.50 on 2 df, -log2(p)=1.08
company 5 customer_service
********************
<lifelines.CoxPHFitter: fitted with 631 observations, 353 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 631
  number of events = 278
partial log-likelihood = -1545.96
  time fit was run = 2019-07-19 01:24:36 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.07      1.07      0.42           -0.77            0.90                0.47                2.45
salary    -1.14      0.32      1.76           -4.59            2.31                0.01               10.09

              z    p  -log2(p)
seniority  0.15 0.88      0.19
salary    -0.65 0.52      0.95
---
Concordance = 0.51
Log-likelihood ratio test = 1.03 on 2 df, -log2(p)=0.74
company 5 data_science
********************
<lifelines.CoxPHFitter: fitted with 213 observations, 113 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 213
  number of events = 100
partial log-likelihood = -440.53
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.51      0.60      0.69           -1.85            0.84                0.16                2.31
salary     0.21      1.24      1.16           -2.06            2.48                0.13               11.92

              z    p  -log2(p)
seniority -0.74 0.46      1.12
salary     0.18 0.86      0.23
---
Concordance = 0.50
Log-likelihood ratio test = 1.10 on 2 df, -log2(p)=0.79
company 5 sales
********************
<lifelines.CoxPHFitter: fitted with 254 observations, 148 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 254
  number of events = 106
partial log-likelihood = -483.46
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.14      1.15      0.63           -1.09            1.38                0.33                3.96
salary    -0.42      0.66      1.62           -3.59            2.75                0.03               15.60

              z    p  -log2(p)
seniority  0.22 0.82      0.28
salary    -0.26 0.79      0.33
---
Concordance = 0.51
Log-likelihood ratio test = 0.07 on 2 df, -log2(p)=0.05
company 6 engineer
********************
<lifelines.CoxPHFitter: fitted with 218 observations, 115 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 218
  number of events = 103
partial log-likelihood = -453.19
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.04      1.04      0.67           -1.27            1.35                0.28                3.86
salary    -0.00      1.00      1.10           -2.16            2.15                0.12                8.59

              z    p  -log2(p)
seniority  0.06 0.95      0.07
salary    -0.00 1.00      0.00
---
Concordance = 0.50
Log-likelihood ratio test = 0.01 on 2 df, -log2(p)=0.01
company 6 marketing
********************
<lifelines.CoxPHFitter: fitted with 174 observations, 117 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 174
  number of events = 57
partial log-likelihood = -252.19
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.64      1.90      0.90           -1.12            2.41                0.33               11.09
salary    -1.18      0.31      2.28           -5.65            3.28                0.00               26.66

              z    p  -log2(p)
seniority  0.72 0.47      1.08
salary    -0.52 0.60      0.73
---
Concordance = 0.57
Log-likelihood ratio test = 0.54 on 2 df, -log2(p)=0.39
company 6 customer_service
********************
<lifelines.CoxPHFitter: fitted with 495 observations, 257 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 495
  number of events = 238
partial log-likelihood = -1260.17
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.13      1.14      0.44           -0.73            1.00                0.48                2.73
salary     0.22      1.24      1.90           -3.52            3.95                0.03               51.94

             z    p  -log2(p)
seniority 0.30 0.76      0.39
salary    0.11 0.91      0.14
---
Concordance = 0.53
Log-likelihood ratio test = 0.63 on 2 df, -log2(p)=0.45
company 6 data_science
********************
<lifelines.CoxPHFitter: fitted with 151 observations, 84 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 151
  number of events = 67
partial log-likelihood = -271.53
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -1.33      0.26      0.82           -2.94            0.28                0.05                1.32
salary     0.79      2.20      1.39           -1.94            3.51                0.14               33.46

              z    p  -log2(p)
seniority -1.62 0.10      3.26
salary     0.57 0.57      0.81
---
Concordance = 0.57
Log-likelihood ratio test = 4.91 on 2 df, -log2(p)=3.54
company 6 sales
********************
<lifelines.CoxPHFitter: fitted with 161 observations, 85 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 161
  number of events = 76
partial log-likelihood = -318.48
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -1.72      0.18      0.79           -3.26           -0.18                0.04                0.84
salary     3.83     46.18      1.93            0.05            7.61                1.05             2025.76

              z    p  -log2(p)
seniority -2.18 0.03      5.11
salary     1.99 0.05      4.41
---
Concordance = 0.59
Log-likelihood ratio test = 4.99 on 2 df, -log2(p)=3.60
company 7 engineer
********************
<lifelines.CoxPHFitter: fitted with 223 observations, 123 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 223
  number of events = 100
partial log-likelihood = -445.24
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.48      0.62      0.69           -1.83            0.88                0.16                2.40
salary     1.23      3.41      1.11           -0.95            3.40                0.39               30.01

              z    p  -log2(p)
seniority -0.69 0.49      1.03
salary     1.10 0.27      1.89
---
Concordance = 0.54
Log-likelihood ratio test = 1.39 on 2 df, -log2(p)=1.00
company 7 marketing
********************
<lifelines.CoxPHFitter: fitted with 140 observations, 77 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 140
  number of events = 63
partial log-likelihood = -243.90
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.54      1.72      0.91           -1.25            2.33                0.29               10.29
salary    -0.92      0.40      2.05           -4.94            3.10                0.01               22.23

              z    p  -log2(p)
seniority  0.59 0.55      0.86
salary    -0.45 0.65      0.61
---
Concordance = 0.54
Log-likelihood ratio test = 0.35 on 2 df, -log2(p)=0.25
company 7 customer_service
********************
<lifelines.CoxPHFitter: fitted with 466 observations, 266 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 466
  number of events = 200
partial log-likelihood = -1045.25
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.07      0.93      0.47           -0.99            0.84                0.37                2.32
salary     0.67      1.96      1.94           -3.12            4.47                0.04               87.26

              z    p  -log2(p)
seniority -0.16 0.87      0.19
salary     0.35 0.73      0.46
---
Concordance = 0.52
Log-likelihood ratio test = 0.17 on 2 df, -log2(p)=0.13
company 7 data_science
********************
<lifelines.CoxPHFitter: fitted with 149 observations, 84 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 149
  number of events = 65
partial log-likelihood = -270.08
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.57      1.77      0.91           -1.21            2.35                0.30               10.52
salary    -1.51      0.22      1.47           -4.39            1.37                0.01                3.95

              z    p  -log2(p)
seniority  0.63 0.53      0.92
salary    -1.03 0.30      1.72
---
Concordance = 0.53
Log-likelihood ratio test = 1.34 on 2 df, -log2(p)=0.96
company 7 sales
********************
<lifelines.CoxPHFitter: fitted with 162 observations, 96 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 162
  number of events = 66
partial log-likelihood = -269.08
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  1.29      3.64      0.85           -0.37            2.95                0.69               19.10
salary    -3.24      0.04      2.09           -7.33            0.84                0.00                2.32

              z    p  -log2(p)
seniority  1.53 0.13      2.98
salary    -1.56 0.12      3.06
---
Concordance = 0.53
Log-likelihood ratio test = 2.57 on 2 df, -log2(p)=1.85
company 8 engineer
********************
<lifelines.CoxPHFitter: fitted with 190 observations, 102 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 190
  number of events = 88
partial log-likelihood = -379.28
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.88      0.41      0.73           -2.32            0.55                0.10                1.73
salary     2.37     10.67      1.32           -0.23            4.96                0.80              143.00

              z    p  -log2(p)
seniority -1.21 0.23      2.14
salary     1.79 0.07      3.76
---
Concordance = 0.55
Log-likelihood ratio test = 3.41 on 2 df, -log2(p)=2.46
company 8 marketing
********************
<lifelines.CoxPHFitter: fitted with 132 observations, 67 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 132
  number of events = 65
partial log-likelihood = -249.50
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.28      1.33      0.96           -1.60            2.16                0.20                8.70
salary     1.67      5.32      2.22           -2.67            6.02                0.07              409.69

             z    p  -log2(p)
seniority 0.30 0.77      0.38
salary    0.75 0.45      1.15
---
Concordance = 0.54
Log-likelihood ratio test = 3.83 on 2 df, -log2(p)=2.76
company 8 customer_service
********************
<lifelines.CoxPHFitter: fitted with 378 observations, 213 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 378
  number of events = 165
partial log-likelihood = -833.58
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.07      1.07      0.51           -0.92            1.06                0.40                2.88
salary     0.42      1.52      2.26           -4.01            4.84                0.02              127.00

             z    p  -log2(p)
seniority 0.13 0.89      0.16
salary    0.19 0.85      0.23
---
Concordance = 0.50
Log-likelihood ratio test = 0.34 on 2 df, -log2(p)=0.24
company 8 data_science
********************
<lifelines.CoxPHFitter: fitted with 143 observations, 78 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 143
  number of events = 65
partial log-likelihood = -261.75
  time fit was run = 2019-07-19 01:24:37 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.49      0.61      0.92           -2.30            1.31                0.10                3.69
salary     1.52      4.55      1.51           -1.44            4.47                0.24               87.63

              z    p  -log2(p)
seniority -0.54 0.59      0.76
salary     1.00 0.32      1.66
---
Concordance = 0.52
Log-likelihood ratio test = 1.37 on 2 df, -log2(p)=0.99
company 8 sales
********************
<lifelines.CoxPHFitter: fitted with 136 observations, 84 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 136
  number of events = 52
partial log-likelihood = -201.66
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.65      1.92      1.12           -1.55            2.85                0.21               17.37
salary    -2.00      0.14      2.51           -6.92            2.92                0.00               18.53

              z    p  -log2(p)
seniority  0.58 0.56      0.84
salary    -0.80 0.43      1.23
---
Concordance = 0.50
Log-likelihood ratio test = 0.69 on 2 df, -log2(p)=0.50
company 9 engineer
********************
<lifelines.CoxPHFitter: fitted with 185 observations, 104 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 185
  number of events = 81
partial log-likelihood = -355.88
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -1.25      0.29      0.77           -2.76            0.25                0.06                1.29
salary     0.92      2.52      1.27           -1.57            3.42                0.21               30.53

              z    p  -log2(p)
seniority -1.63 0.10      3.28
salary     0.72 0.47      1.09
---
Concordance = 0.55
Log-likelihood ratio test = 4.53 on 2 df, -log2(p)=3.27
company 9 marketing
********************
<lifelines.CoxPHFitter: fitted with 124 observations, 62 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 124
  number of events = 62
partial log-likelihood = -248.72
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.57      0.56      0.85           -2.24            1.10                0.11                3.01
salary     1.02      2.76      2.08           -3.06            5.09                0.05              162.65

              z    p  -log2(p)
seniority -0.67 0.50      0.99
salary     0.49 0.63      0.68
---
Concordance = 0.56
Log-likelihood ratio test = 0.48 on 2 df, -log2(p)=0.34
company 9 customer_service
********************
<lifelines.CoxPHFitter: fitted with 341 observations, 186 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 341
  number of events = 155
partial log-likelihood = -773.19
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.09      1.09      0.61           -1.11            1.29                0.33                3.63
salary     0.00      1.00      2.44           -4.78            4.78                0.01              119.59

             z    p  -log2(p)
seniority 0.15 0.88      0.18
salary    0.00 1.00      0.00
---
Concordance = 0.50
Log-likelihood ratio test = 0.10 on 2 df, -log2(p)=0.07
company 9 data_science
********************
<lifelines.CoxPHFitter: fitted with 133 observations, 70 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 133
  number of events = 63
partial log-likelihood = -253.93
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -0.07      0.93      0.84           -1.72            1.57                0.18                4.80
salary     0.89      2.43      1.27           -1.60            3.38                0.20               29.37

              z    p  -log2(p)
seniority -0.09 0.93      0.11
salary     0.70 0.48      1.04
---
Concordance = 0.57
Log-likelihood ratio test = 1.22 on 2 df, -log2(p)=0.88
company 9 sales
********************
<lifelines.CoxPHFitter: fitted with 113 observations, 63 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 113
  number of events = 50
partial log-likelihood = -192.57
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  1.18      3.26      1.00           -0.77            3.14                0.46               23.05
salary    -2.20      0.11      2.66           -7.41            3.02                0.00               20.50

              z    p  -log2(p)
seniority  1.18 0.24      2.08
salary    -0.83 0.41      1.29
---
Concordance = 0.52
Log-likelihood ratio test = 1.60 on 2 df, -log2(p)=1.16
company 10 engineer
********************
<lifelines.CoxPHFitter: fitted with 170 observations, 93 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 170
  number of events = 77
partial log-likelihood = -318.59
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.45      1.57      0.81           -1.14            2.03                0.32                7.64
salary    -1.97      0.14      1.42           -4.76            0.82                0.01                2.28

              z    p  -log2(p)
seniority  0.56 0.58      0.79
salary    -1.38 0.17      2.58
---
Concordance = 0.54
Log-likelihood ratio test = 3.39 on 2 df, -log2(p)=2.45
company 10 marketing
********************
<lifelines.CoxPHFitter: fitted with 96 observations, 56 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 96
  number of events = 40
partial log-likelihood = -151.41
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.25      1.29      1.00           -1.70            2.21                0.18                9.09
salary     0.50      1.65      2.53           -4.45            5.45                0.01              233.73

             z    p  -log2(p)
seniority 0.26 0.80      0.33
salary    0.20 0.84      0.25
---
Concordance = 0.58
Log-likelihood ratio test = 0.53 on 2 df, -log2(p)=0.39
company 10 customer_service
********************
<lifelines.CoxPHFitter: fitted with 333 observations, 189 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 333
  number of events = 144
partial log-likelihood = -711.79
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.97      2.65      0.60           -0.20            2.15                0.82                8.55
salary    -2.35      0.10      2.46           -7.17            2.47                0.00               11.87

              z    p  -log2(p)
seniority  1.63 0.10      3.27
salary    -0.95 0.34      1.56
---
Concordance = 0.54
Log-likelihood ratio test = 3.15 on 2 df, -log2(p)=2.27
company 10 data_science
********************
<lifelines.CoxPHFitter: fitted with 108 observations, 51 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 108
  number of events = 57
partial log-likelihood = -216.03
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority -2.80      0.06      1.03           -4.81           -0.78                0.01                0.46
salary     3.97     53.04      1.59            0.85            7.10                2.33             1206.08

              z    p  -log2(p)
seniority -2.72 0.01      7.27
salary     2.49 0.01      6.30
---
Concordance = 0.60
Log-likelihood ratio test = 7.73 on 2 df, -log2(p)=5.57
company 10 sales
********************
<lifelines.CoxPHFitter: fitted with 110 observations, 64 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 110
  number of events = 46
partial log-likelihood = -171.25
  time fit was run = 2019-07-19 01:24:38 UTC

---
           coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
seniority  0.98      2.68      1.03           -1.04            3.01                0.35               20.20
salary    -4.30      0.01      2.54           -9.29            0.68                0.00                1.98

              z    p  -log2(p)
seniority  0.95 0.34      1.56
salary    -1.69 0.09      3.46
---
Concordance = 0.63
Log-likelihood ratio test = 3.83 on 2 df, -log2(p)=2.77

# Using Cox Proportional Hazards model
for i in range(10):
    df_dummy2 = df_dummy[df_dummy.company_id == i+1]
    df_dummy2 = df_dummy2.drop(columns = ['company_id','seniority'])   
    
# Using Cox Proportional Hazards model
    for j,de in enumerate(depts):
        if j >1:
            data_test = df_dummy2[df_dummy2.dept == de]
            data_test = data_test.drop(columns = ['dept'])   
            cph = CoxPHFitter()   ## Instantiate the class to create a cph object
            cph.fit(data_test, 'lasting_days', event_col='event')   ## Fit the data to train the model
            print("company", i+1, de)
            print('*'*20)
            cph.print_summary()    ## HAve a look at the significance of the features

company 1 engineer
********************
<lifelines.CoxPHFitter: fitted with 1552 observations, 738 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1552
  number of events = 814
partial log-likelihood = -5206.53
  time fit was run = 2019-07-19 12:59:17 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.19      0.82      0.16           -0.52            0.13                0.60                1.14

           z    p  -log2(p)
salary -1.19 0.24      2.08
---
Concordance = 0.52
Log-likelihood ratio test = 1.39 on 1 df, -log2(p)=2.07
company 1 marketing
********************
<lifelines.CoxPHFitter: fitted with 1074 observations, 613 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1074
  number of events = 461
partial log-likelihood = -2791.89
  time fit was run = 2019-07-19 12:59:17 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.09      0.91      0.33           -0.74            0.56                0.48                1.74

           z    p  -log2(p)
salary -0.28 0.78      0.36
---
Concordance = 0.51
Log-likelihood ratio test = 0.08 on 1 df, -log2(p)=0.36
company 1 customer_service
********************
<lifelines.CoxPHFitter: fitted with 3129 observations, 1791 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 3129
  number of events = 1338
partial log-likelihood = -9540.69
  time fit was run = 2019-07-19 12:59:17 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.21      0.81      0.33           -0.84            0.43                0.43                1.54

           z    p  -log2(p)
salary -0.63 0.53      0.92
---
Concordance = 0.51
Log-likelihood ratio test = 0.40 on 1 df, -log2(p)=0.92
company 1 data_science
********************
<lifelines.CoxPHFitter: fitted with 1070 observations, 562 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1070
  number of events = 508
partial log-likelihood = -3089.99
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.41      1.51      0.22           -0.02            0.85                0.98                2.33

          z    p  -log2(p)
salary 1.86 0.06      4.00
---
Concordance = 0.53
Log-likelihood ratio test = 3.54 on 1 df, -log2(p)=4.06
company 1 sales
********************
<lifelines.CoxPHFitter: fitted with 1091 observations, 612 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1091
  number of events = 479
partial log-likelihood = -2916.82
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.04      1.04      0.33           -0.60            0.69                0.55                1.99

          z    p  -log2(p)
salary 0.13 0.90      0.16
---
Concordance = 0.51
Log-likelihood ratio test = 0.02 on 1 df, -log2(p)=0.16
company 2 engineer
********************
<lifelines.CoxPHFitter: fitted with 822 observations, 380 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 822
  number of events = 442
partial log-likelihood = -2599.59
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.09      1.10      0.23           -0.36            0.54                0.70                1.72

          z    p  -log2(p)
salary 0.40 0.69      0.53
---
Concordance = 0.51
Log-likelihood ratio test = 0.16 on 1 df, -log2(p)=0.53
company 2 marketing
********************
<lifelines.CoxPHFitter: fitted with 535 observations, 291 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 535
  number of events = 244
partial log-likelihood = -1324.51
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.27      0.76      0.46           -1.18            0.63                0.31                1.88

           z    p  -log2(p)
salary -0.59 0.55      0.85
---
Concordance = 0.52
Log-likelihood ratio test = 0.35 on 1 df, -log2(p)=0.85
company 2 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1530 observations, 828 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1530
  number of events = 702
partial log-likelihood = -4468.09
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.25      0.78      0.46           -1.15            0.65                0.32                1.92

           z    p  -log2(p)
salary -0.54 0.59      0.76
---
Concordance = 0.51
Log-likelihood ratio test = 0.29 on 1 df, -log2(p)=0.76
company 2 data_science
********************
<lifelines.CoxPHFitter: fitted with 562 observations, 265 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 562
  number of events = 297
partial log-likelihood = -1617.68
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.08      0.92      0.30           -0.66            0.50                0.52                1.65

           z    p  -log2(p)
salary -0.27 0.79      0.34
---
Concordance = 0.50
Log-likelihood ratio test = 0.07 on 1 df, -log2(p)=0.34
company 2 sales
********************
<lifelines.CoxPHFitter: fitted with 508 observations, 291 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 508
  number of events = 217
partial log-likelihood = -1144.78
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.46      1.58      0.49           -0.50            1.41                0.61                4.11

          z    p  -log2(p)
salary 0.93 0.35      1.51
---
Concordance = 0.52
Log-likelihood ratio test = 0.88 on 1 df, -log2(p)=1.52
company 3 engineer
********************
<lifelines.CoxPHFitter: fitted with 512 observations, 291 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 512
  number of events = 221
partial log-likelihood = -1176.67
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.67      0.51      0.40           -1.46            0.12                0.23                1.13

           z    p  -log2(p)
salary -1.66 0.10      3.35
---
Concordance = 0.52
Log-likelihood ratio test = 2.68 on 1 df, -log2(p)=3.30
company 3 marketing
********************
<lifelines.CoxPHFitter: fitted with 367 observations, 213 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 367
  number of events = 154
partial log-likelihood = -771.21
  time fit was run = 2019-07-19 12:59:18 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.52      0.60      0.68           -1.85            0.82                0.16                2.26

           z    p  -log2(p)
salary -0.76 0.45      1.16
---
Concordance = 0.51
Log-likelihood ratio test = 0.58 on 1 df, -log2(p)=1.16
company 3 customer_service
********************
<lifelines.CoxPHFitter: fitted with 1000 observations, 538 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 1000
  number of events = 462
partial log-likelihood = -2799.56
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.30      0.27      0.68           -2.64            0.03                0.07                1.03

           z    p  -log2(p)
salary -1.92 0.06      4.18
---
Concordance = 0.51
Log-likelihood ratio test = 3.64 on 1 df, -log2(p)=4.15
company 3 data_science
********************
<lifelines.CoxPHFitter: fitted with 345 observations, 193 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 345
  number of events = 152
partial log-likelihood = -756.05
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.23      1.26      0.52           -0.78            1.25                0.46                3.48

          z    p  -log2(p)
salary 0.45 0.65      0.61
---
Concordance = 0.50
Log-likelihood ratio test = 0.20 on 1 df, -log2(p)=0.61
company 3 sales
********************
<lifelines.CoxPHFitter: fitted with 359 observations, 203 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 359
  number of events = 156
partial log-likelihood = -763.74
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  1.44      4.21      0.72            0.03            2.85                1.03               17.23

          z    p  -log2(p)
salary 2.00 0.05      4.45
---
Concordance = 0.56
Log-likelihood ratio test = 4.07 on 1 df, -log2(p)=4.52
company 4 engineer
********************
<lifelines.CoxPHFitter: fitted with 375 observations, 208 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 375
  number of events = 167
partial log-likelihood = -845.60
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.15      1.16      0.51           -0.85            1.15                0.43                3.17

          z    p  -log2(p)
salary 0.30 0.77      0.38
---
Concordance = 0.50
Log-likelihood ratio test = 0.09 on 1 df, -log2(p)=0.38
company 4 marketing
********************
<lifelines.CoxPHFitter: fitted with 263 observations, 153 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 263
  number of events = 110
partial log-likelihood = -514.29
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.42      0.66      0.88           -2.15            1.31                0.12                3.69

           z    p  -log2(p)
salary -0.48 0.63      0.66
---
Concordance = 0.52
Log-likelihood ratio test = 0.23 on 1 df, -log2(p)=0.66
company 4 customer_service
********************
<lifelines.CoxPHFitter: fitted with 769 observations, 412 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 769
  number of events = 357
partial log-likelihood = -2051.02
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.67      1.95      0.81           -0.92            2.26                0.40                9.57

          z    p  -log2(p)
salary 0.83 0.41      1.29
---
Concordance = 0.51
Log-likelihood ratio test = 0.69 on 1 df, -log2(p)=1.30
company 4 data_science
********************
<lifelines.CoxPHFitter: fitted with 277 observations, 159 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 277
  number of events = 118
partial log-likelihood = -546.81
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.26      1.30      0.61           -0.94            1.46                0.39                4.31

          z    p  -log2(p)
salary 0.42 0.67      0.57
---
Concordance = 0.52
Log-likelihood ratio test = 0.18 on 1 df, -log2(p)=0.58
company 4 sales
********************
<lifelines.CoxPHFitter: fitted with 252 observations, 150 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 252
  number of events = 102
partial log-likelihood = -468.70
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.40      1.49      0.93           -1.42            2.23                0.24                9.28

          z    p  -log2(p)
salary 0.43 0.67      0.59
---
Concordance = 0.51
Log-likelihood ratio test = 0.19 on 1 df, -log2(p)=0.59
company 5 engineer
********************
<lifelines.CoxPHFitter: fitted with 311 observations, 178 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 311
  number of events = 133
partial log-likelihood = -647.13
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.22      0.30      0.55           -2.30           -0.13                0.10                0.87

           z    p  -log2(p)
salary -2.20 0.03      5.18
---
Concordance = 0.52
Log-likelihood ratio test = 4.71 on 1 df, -log2(p)=5.06
company 5 marketing
********************
<lifelines.CoxPHFitter: fitted with 224 observations, 114 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 224
  number of events = 110
partial log-likelihood = -497.49
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.05      1.05      0.81           -1.53            1.63                0.22                5.12

          z    p  -log2(p)
salary 0.06 0.95      0.07
---
Concordance = 0.51
Log-likelihood ratio test = 0.00 on 1 df, -log2(p)=0.07
company 5 customer_service
********************
<lifelines.CoxPHFitter: fitted with 631 observations, 353 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 631
  number of events = 278
partial log-likelihood = -1545.97
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.91      0.40      0.90           -2.68            0.86                0.07                2.37

           z    p  -log2(p)
salary -1.01 0.31      1.67
---
Concordance = 0.51
Log-likelihood ratio test = 1.00 on 1 df, -log2(p)=1.66
company 5 data_science
********************
<lifelines.CoxPHFitter: fitted with 213 observations, 113 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 213
  number of events = 100
partial log-likelihood = -440.81
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.50      0.61      0.66           -1.80            0.81                0.17                2.24

           z    p  -log2(p)
salary -0.75 0.46      1.13
---
Concordance = 0.52
Log-likelihood ratio test = 0.55 on 1 df, -log2(p)=1.12
company 5 sales
********************
<lifelines.CoxPHFitter: fitted with 254 observations, 148 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 254
  number of events = 106
partial log-likelihood = -483.49
  time fit was run = 2019-07-19 12:59:19 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.12      0.89      0.88           -1.83            1.60                0.16                4.94

           z    p  -log2(p)
salary -0.14 0.89      0.16
---
Concordance = 0.48
Log-likelihood ratio test = 0.02 on 1 df, -log2(p)=0.16
company 6 engineer
********************
<lifelines.CoxPHFitter: fitted with 218 observations, 115 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 218
  number of events = 103
partial log-likelihood = -453.19
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.05      1.05      0.60           -1.12            1.22                0.33                3.40

          z    p  -log2(p)
salary 0.09 0.93      0.10
---
Concordance = 0.51
Log-likelihood ratio test = 0.01 on 1 df, -log2(p)=0.10
company 6 marketing
********************
<lifelines.CoxPHFitter: fitted with 174 observations, 117 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 174
  number of events = 57
partial log-likelihood = -252.45
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.20      1.22      1.17           -2.10            2.50                0.12               12.19

          z    p  -log2(p)
salary 0.17 0.86      0.21
---
Concordance = 0.50
Log-likelihood ratio test = 0.03 on 1 df, -log2(p)=0.21
company 6 customer_service
********************
<lifelines.CoxPHFitter: fitted with 495 observations, 257 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 495
  number of events = 238
partial log-likelihood = -1260.22
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.71      2.04      0.98           -1.20            2.63                0.30               13.86

          z    p  -log2(p)
salary 0.73 0.46      1.10
---
Concordance = 0.53
Log-likelihood ratio test = 0.54 on 1 df, -log2(p)=1.11
company 6 data_science
********************
<lifelines.CoxPHFitter: fitted with 151 observations, 84 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 151
  number of events = 67
partial log-likelihood = -272.89
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.13      0.32      0.76           -2.61            0.35                0.07                1.42

           z    p  -log2(p)
salary -1.50 0.13      2.90
---
Concordance = 0.57
Log-likelihood ratio test = 2.18 on 1 df, -log2(p)=2.84
company 6 sales
********************
<lifelines.CoxPHFitter: fitted with 161 observations, 85 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 161
  number of events = 76
partial log-likelihood = -320.96
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.22      1.24      1.06           -1.85            2.29                0.16                9.85

          z    p  -log2(p)
salary 0.21 0.84      0.26
---
Concordance = 0.49
Log-likelihood ratio test = 0.04 on 1 df, -log2(p)=0.26
company 7 engineer
********************
<lifelines.CoxPHFitter: fitted with 223 observations, 123 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 223
  number of events = 100
partial log-likelihood = -445.48
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.59      1.80      0.62           -0.63            1.81                0.53                6.09

          z    p  -log2(p)
salary 0.94 0.35      1.53
---
Concordance = 0.54
Log-likelihood ratio test = 0.90 on 1 df, -log2(p)=1.55
company 7 marketing
********************
<lifelines.CoxPHFitter: fitted with 140 observations, 77 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 140
  number of events = 63
partial log-likelihood = -244.08
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.09      1.09      1.13           -2.12            2.29                0.12                9.92

          z    p  -log2(p)
salary 0.08 0.94      0.09
---
Concordance = 0.57
Log-likelihood ratio test = 0.01 on 1 df, -log2(p)=0.09
company 7 customer_service
********************
<lifelines.CoxPHFitter: fitted with 466 observations, 266 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 466
  number of events = 200
partial log-likelihood = -1045.27
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.42      1.52      1.09           -1.72            2.57                0.18               13.00

          z    p  -log2(p)
salary 0.39 0.70      0.51
---
Concordance = 0.51
Log-likelihood ratio test = 0.15 on 1 df, -log2(p)=0.52
company 7 data_science
********************
<lifelines.CoxPHFitter: fitted with 149 observations, 84 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 149
  number of events = 65
partial log-likelihood = -270.28
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.71      0.49      0.73           -2.14            0.72                0.12                2.05

           z    p  -log2(p)
salary -0.98 0.33      1.60
---
Concordance = 0.52
Log-likelihood ratio test = 0.94 on 1 df, -log2(p)=1.59
company 7 sales
********************
<lifelines.CoxPHFitter: fitted with 162 observations, 96 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 162
  number of events = 66
partial log-likelihood = -270.22
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.57      0.57      1.07           -2.66            1.52                0.07                4.57

           z    p  -log2(p)
salary -0.53 0.59      0.75
---
Concordance = 0.50
Log-likelihood ratio test = 0.28 on 1 df, -log2(p)=0.75
company 8 engineer
********************
<lifelines.CoxPHFitter: fitted with 190 observations, 102 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 190
  number of events = 88
partial log-likelihood = -380.02
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  1.07      2.90      0.78           -0.47            2.60                0.63               13.48

          z    p  -log2(p)
salary 1.36 0.17      2.53
---
Concordance = 0.53
Log-likelihood ratio test = 1.92 on 1 df, -log2(p)=2.59
company 8 marketing
********************
<lifelines.CoxPHFitter: fitted with 132 observations, 67 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 132
  number of events = 65
partial log-likelihood = -249.54
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  2.23      9.28      1.16           -0.04            4.50                0.96               90.02

          z    p  -log2(p)
salary 1.92 0.05      4.19
---
Concordance = 0.55
Log-likelihood ratio test = 3.74 on 1 df, -log2(p)=4.24
company 8 customer_service
********************
<lifelines.CoxPHFitter: fitted with 378 observations, 213 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 378
  number of events = 165
partial log-likelihood = -833.59
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.68      1.96      1.20           -1.67            3.02                0.19               20.52

          z    p  -log2(p)
salary 0.56 0.57      0.80
---
Concordance = 0.51
Log-likelihood ratio test = 0.32 on 1 df, -log2(p)=0.81
company 8 data_science
********************
<lifelines.CoxPHFitter: fitted with 143 observations, 78 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 143
  number of events = 65
partial log-likelihood = -261.90
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.82      2.28      0.80           -0.75            2.40                0.47               11.01

          z    p  -log2(p)
salary 1.02 0.31      1.70
---
Concordance = 0.53
Log-likelihood ratio test = 1.08 on 1 df, -log2(p)=1.74
company 8 sales
********************
<lifelines.CoxPHFitter: fitted with 136 observations, 84 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 136
  number of events = 52
partial log-likelihood = -201.83
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.74      0.48      1.25           -3.19            1.70                0.04                5.50

           z    p  -log2(p)
salary -0.60 0.55      0.86
---
Concordance = 0.49
Log-likelihood ratio test = 0.36 on 1 df, -log2(p)=0.86
company 9 engineer
********************
<lifelines.CoxPHFitter: fitted with 185 observations, 104 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 185
  number of events = 81
partial log-likelihood = -357.23
  time fit was run = 2019-07-19 12:59:20 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.89      0.41      0.65           -2.18            0.39                0.11                1.48

           z    p  -log2(p)
salary -1.37 0.17      2.54
---
Concordance = 0.51
Log-likelihood ratio test = 1.83 on 1 df, -log2(p)=2.50
company 9 marketing
********************
<lifelines.CoxPHFitter: fitted with 124 observations, 62 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 124
  number of events = 62
partial log-likelihood = -248.95
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.18      0.84      1.11           -2.35            2.00                0.10                7.35

           z    p  -log2(p)
salary -0.16 0.87      0.19
---
Concordance = 0.50
Log-likelihood ratio test = 0.03 on 1 df, -log2(p)=0.19
company 9 customer_service
********************
<lifelines.CoxPHFitter: fitted with 341 observations, 186 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 341
  number of events = 155
partial log-likelihood = -773.20
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.32      1.37      1.14           -1.91            2.54                0.15               12.72

          z    p  -log2(p)
salary 0.28 0.78      0.36
---
Concordance = 0.50
Log-likelihood ratio test = 0.08 on 1 df, -log2(p)=0.36
company 9 data_science
********************
<lifelines.CoxPHFitter: fitted with 133 observations, 70 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 133
  number of events = 63
partial log-likelihood = -253.94
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.80      2.22      0.73           -0.64            2.23                0.53                9.33

          z    p  -log2(p)
salary 1.09 0.28      1.85
---
Concordance = 0.57
Log-likelihood ratio test = 1.21 on 1 df, -log2(p)=1.88
company 9 sales
********************
<lifelines.CoxPHFitter: fitted with 113 observations, 63 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 113
  number of events = 50
partial log-likelihood = -193.26
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.57      1.76      1.22           -1.83            2.97                0.16               19.41

          z    p  -log2(p)
salary 0.46 0.64      0.64
---
Concordance = 0.48
Log-likelihood ratio test = 0.22 on 1 df, -log2(p)=0.64
company 10 engineer
********************
<lifelines.CoxPHFitter: fitted with 170 observations, 93 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 170
  number of events = 77
partial log-likelihood = -318.74
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -1.29      0.28      0.72           -2.71            0.13                0.07                1.13

           z    p  -log2(p)
salary -1.79 0.07      3.75
---
Concordance = 0.55
Log-likelihood ratio test = 3.08 on 1 df, -log2(p)=3.66
company 10 marketing
********************
<lifelines.CoxPHFitter: fitted with 96 observations, 56 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 96
  number of events = 40
partial log-likelihood = -151.44
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  1.02      2.77      1.50           -1.92            3.96                0.15               52.37

          z    p  -log2(p)
salary 0.68 0.50      1.01
---
Concordance = 0.58
Log-likelihood ratio test = 0.47 on 1 df, -log2(p)=1.02
company 10 customer_service
********************
<lifelines.CoxPHFitter: fitted with 333 observations, 189 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 333
  number of events = 144
partial log-likelihood = -713.10
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary  0.97      2.64      1.33           -1.64            3.58                0.19               35.85

          z    p  -log2(p)
salary 0.73 0.47      1.10
---
Concordance = 0.54
Log-likelihood ratio test = 0.54 on 1 df, -log2(p)=1.11
company 10 data_science
********************
<lifelines.CoxPHFitter: fitted with 108 observations, 51 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 108
  number of events = 57
partial log-likelihood = -219.88
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -0.09      0.91      0.71           -1.48            1.29                0.23                3.64

           z    p  -log2(p)
salary -0.13 0.89      0.16
---
Concordance = 0.50
Log-likelihood ratio test = 0.02 on 1 df, -log2(p)=0.16
company 10 sales
********************
<lifelines.CoxPHFitter: fitted with 110 observations, 64 censored>
      duration col = 'lasting_days'
         event col = 'event'
number of subjects = 110
  number of events = 46
partial log-likelihood = -171.71
  time fit was run = 2019-07-19 12:59:21 UTC

---
        coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
salary -2.27      0.10      1.31           -4.84            0.30                0.01                1.35

           z    p  -log2(p)
salary -1.73 0.08      3.58
---
Concordance = 0.64
Log-likelihood ratio test = 2.93 on 1 df, -log2(p)=3.52

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
EDA_survival_analysis.ipynb		EDA_survival_analysis.ipynb
README.md		README.md
To leave or not to leave_ (1).pdf		To leave or not to leave_ (1).pdf
company_survival.png		company_survival.png
dept_survival.png		dept_survival.png
employee_retention.txt		employee_retention.txt
output_20_0.png		output_20_0.png
output_23_0.png		output_23_0.png
output_24_0.png		output_24_0.png
output_35_1.png		output_35_1.png
output_36_0.png		output_36_0.png
output_47_1.png		output_47_1.png
output_48_1.png		output_48_1.png
output_49_1.png		output_49_1.png
output_50_1.png		output_50_1.png
salary.png		salary.png
seniority.png		seniority.png
stat.png		stat.png
stat2.png		stat2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SurvivalAnalysis

Background:

Quick look:

Prerequisites

The notebook:

Data cleaning

Survival analysis

About

Releases

Packages

Languages

YIZHE12/stats_churn

Folders and files

Latest commit

History

Repository files navigation

SurvivalAnalysis

Background:

Quick look:

Prerequisites

The notebook:

Data cleaning

Survival analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages