Resignation prediction using machine learning algorithms

by A. Zayer

1. Introduction

Employees retention, especially in large companies, is and still will be a hot topic. Considerable amounts of money and time are spent during the hiring and the training process,therefore, the ability to uderstand and forecast future resignations is of prime interest since this, could help decision makers and hiring managers to prevent such situations from appearing by taking the appropriate measures.

An exploratory-predictive analysis will be carried out in order to understand what impels employees to resign. The kaggle dataset released under the CC BY-SA 4.0 License is used for this purpose. The dataset has 14999 rows and 9 columns with the following names:

|Satisfaction_level   | Level of satisfaction (0-1)|
|Last_evaluation      | Evaluation of employee performance (0-1)|
|Number_project       | Number of projects completed while at work|
|Average_monthly_hours| Average monthly hours at workplace|
|Time_spend_company   | Number of years spent in the company|
|Work_accident        | Whether the employee had a workplace accident|
|Left                 | Whether the employee left the workplace or not (1 or 0) Factor|
|Promotion_last_5years| Whether the employee was promoted in the last five years|
|Departments          | Department in which they work for|
|Salary               | Relative level of salary (low med high)|

2. Preprocessing

2.1 Import libraries



In [1]:

    
%matplotlib inline
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict
from sklearn.calibration import CalibratedClassifierCV
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn import metrics
from sklearn.ensemble import GradientBoostingClassifier
from palettable import colorbrewer as cb 
from IPython.display import display
from jupyterthemes import get_themes
from jupyterthemes.stylefx import set_nb_theme
themes = get_themes()
set_nb_theme(themes[0])
sns.set_style('whitegrid')
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5,'figure.figsize': (12, 9)})
pd.set_option('display.float_format', lambda x: '%.2f' % x)

2.2 Data Preprocessing

The csv data file is loaded to the memory as a pandas dataframe.



In [2]:

    
df = pd.read_csv('./data/hr.csv',sep=";") #Import data



In [3]:

    
df.shape # check how many rows and columns in the data









    Out[3]:





(14999, 10)

Let's have a quick look at the first rows



In [4]:

    
df.head(10)









    Out[4]:






  
    
      
      Satisfaction_level
      Last_evaluation
      Number_project
      Average_montly_hours
      Time_spend_company
      Work_accident
      Left
      Promotion_last_5years
      Departments
      Salary
    
  
  
    
      0
      0.38
      0.53
      2
      157
      3
      0
      1
      0
      Sales
      Low
    
    
      1
      0.80
      0.86
      5
      262
      6
      0
      1
      0
      Sales
      Medium
    
    
      2
      0.11
      0.88
      7
      272
      4
      0
      1
      0
      Sales
      Medium
    
    
      3
      0.72
      0.87
      5
      223
      5
      0
      1
      0
      Sales
      Low
    
    
      4
      0.37
      0.52
      2
      159
      3
      0
      1
      0
      Sales
      Low
    
    
      5
      0.41
      0.50
      2
      153
      3
      0
      1
      0
      Sales
      Low
    
    
      6
      0.10
      0.77
      6
      247
      4
      0
      1
      0
      Sales
      Low
    
    
      7
      0.92
      0.85
      5
      259
      5
      0
      1
      0
      Sales
      Low
    
    
      8
      0.89
      1.00
      5
      224
      5
      0
      1
      0
      Sales
      Low
    
    
      9
      0.42
      0.53
      2
      142
      3
      0
      1
      0
      Sales
      Low

and the last rows



In [5]:

    
df.tail(10)









    Out[5]:






  
    
      
      Satisfaction_level
      Last_evaluation
      Number_project
      Average_montly_hours
      Time_spend_company
      Work_accident
      Left
      Promotion_last_5years
      Departments
      Salary
    
  
  
    
      14989
      0.43
      0.57
      2
      159
      3
      1
      1
      0
      Technical
      Low
    
    
      14990
      0.89
      0.88
      5
      228
      5
      1
      1
      0
      Support
      Low
    
    
      14991
      0.09
      0.81
      6
      257
      4
      0
      1
      0
      Support
      Low
    
    
      14992
      0.40
      0.48
      2
      155
      3
      0
      1
      0
      Support
      Low
    
    
      14993
      0.76
      0.83
      6
      293
      6
      0
      1
      0
      Support
      Low
    
    
      14994
      0.40
      0.57
      2
      151
      3
      0
      1
      0
      Support
      Low
    
    
      14995
      0.37
      0.48
      2
      160
      3
      0
      1
      0
      Support
      Low
    
    
      14996
      0.37
      0.53
      2
      143
      3
      0
      1
      0
      Support
      Low
    
    
      14997
      0.11
      0.96
      6
      280
      4
      0
      1
      0
      Support
      Low
    
    
      14998
      0.37
      0.52
      2
      158
      3
      0
      1
      0
      Support
      Low

Rows can be seen as instances of a class called employees, wehere columns represent attributes.



In [6]:

    
for key in (df.columns.values):
    print(key)









    



Satisfaction_level
Last_evaluation
Number_project
Average_montly_hours
Time_spend_company
Work_accident
Left
Promotion_last_5years
Departments
Salary

Data is a mixture of numerical and categorical values. The column 'Departments' is a nominal variable with the following categories:



In [7]:

    
print(df.Departments.value_counts())









    



Sales          4140
Technical      2720
Support        2229
IT             1227
Product_mng     902
Marketing       858
R&D             787
Accounting      767
HR              739
Management      630
Name: Departments, dtype: int64

The 'Salary' variable has 3 ordinal categories



In [8]:

    
print(df.Salary.value_counts())









    



Low       7316
Medium    6446
High      1237
Name: Salary, dtype: int64

Tha salary and departments columns need to be encoded in integer numbers in order to be handled with learning algorithms, which will be done in the predictive section of this study. We have also two binary variables namely 'Left' and 'Promotion_last_5years'



In [9]:

    
print(df['Left'].value_counts()) #check the occurence od the binary values









    



0    11428
1     3571
Name: Left, dtype: int64



In [10]:

    
print(df['Promotion_last_5years'].value_counts()) #check the occurence of the binary values









    



0    14680
1      319
Name: Promotion_last_5years, dtype: int64

Check for missing or incomplete data



In [11]:

    
np.count_nonzero(df.isnull())









    Out[11]:





0

3. Exploratory analysis

3.1 Summary statistics of the numerical features:



In [12]:

    
df.describe().T









    Out[12]:






  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      Satisfaction_level
      14999.00
      0.61
      0.25
      0.09
      0.44
      0.64
      0.82
      1.00
    
    
      Last_evaluation
      14999.00
      0.72
      0.17
      0.36
      0.56
      0.72
      0.87
      1.00
    
    
      Number_project
      14999.00
      3.80
      1.23
      2.00
      3.00
      4.00
      5.00
      7.00
    
    
      Average_montly_hours
      14999.00
      201.05
      49.94
      96.00
      156.00
      200.00
      245.00
      310.00
    
    
      Time_spend_company
      14999.00
      3.50
      1.46
      2.00
      3.00
      3.00
      4.00
      10.00
    
    
      Work_accident
      14999.00
      0.14
      0.35
      0.00
      0.00
      0.00
      0.00
      1.00
    
    
      Left
      14999.00
      0.24
      0.43
      0.00
      0.00
      0.00
      0.00
      1.00
    
    
      Promotion_last_5years
      14999.00
      0.02
      0.14
      0.00
      0.00
      0.00
      0.00
      1.00

The average monthly hours is around 200 hours/month, which is about 10 hours/day, assuming five business days per week.

In the past five years, around 24% of the employees left the company. This number is quit alarming and raises questions about the tenure policy of the company.

The'satisfaction level' and 'last evaluation' averages are 0.61 and 0.72 respectively. Although these variables are not always objective but considering that we have a large dataset, the noise associated with each employee judgment should be minimal when averaging over large numbers.

The average employees’ life cycle is about 3.5 years which means an average employee may complete this loop more than 15 times in his career if he keeps looking for work in companies similar to this one.

3.2 Data visualisation

Create some auxiliary variables for visualisation



In [13]:

    
colors = cb.qualitative.Set3_12.hex_colors
colors1=cb.qualitative.Paired_11.hex_colors
colorz=['#EA8E83','#FFFFB3','#B3DE69','#FDB462']
colorz2=['#96B68D','#807885','#D1D3D4','#C7B5A7','#B5C2C9','#F2CF9A','#C58083']
colorz3=['#F6D3E5','#EA8E83']
labelz =["Satisfaction level","Last evaluation","Number of projects","Average monthly hours",
         "Time spent company","Work accident","Left","Promotion last 5 years"]
left_labels =["Stayers","Leavers"]
salez = ["Satisfaction level",
        "Last evaluation",
        "Number of projects",
        "Average monthly hours",
        "Time spent company",
        "Work accident" ,
        "Left",
        "Promotion last 5 years",
        "Departments",
        "Salary"]



In [14]:

    
ax = sns.countplot(
        x='Salary',
        data=df,
        hue='Departments',
        hue_order=df['Departments'].value_counts().index,
        palette= colors)
_ = ax.set_xlabel('Salary')
_ = ax.set_ylabel('Number of employees')
_ = ax.set_title('Salaries distibution')
_ = plt.legend(bbox_to_anchor=(1.02, 1.0), loc=2, borderaxespad=0.)



In [15]:

    
ax = sns.boxplot(
        y='Departments',#Column to split upon
        x='Average_montly_hours',# Column to plot
        data=df,
        hue='Left',
        width=0.35,
        fliersize=5,
        palette=colorz3,
      
        flierprops={
            'marker': '.'})
_ = ax.set_xlabel('Average monthly hours')
_ = ax.set_ylabel('Departments')
handles, labels = ax.get_legend_handles_labels()
l = plt.legend(handles[0:2], left_labels[0:2], bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

Overworked employees tend to leave



In [16]:

    
ax = sns.countplot(
        x='Salary',
        hue='Left',
        data=df,
        palette=colorz3)
_ = ax.set_xlabel('Salary')
_ = ax.set_ylabel('Employees')

_ = plt.legend(bbox_to_anchor=(1.02, 1.0), loc=2, borderaxespad=0.,labels={"Stayers","Leavers"})

especially if they are not well paid



In [17]:

    
# satisfaction level among the leavers and the Stayers in different departments
from numpy import median
ax = sns.barplot(
        y='Departments',
        x='Satisfaction_level',
        data=df,
        ci=None,
        hue='Left',       
        estimator=median,
        palette=colorz3)        
_ = ax.set_ylabel('Departments')
_ = ax.set_xlabel('Satisfaction_level')
_ = plt.legend(bbox_to_anchor=(1.02, 1.0), loc=2, borderaxespad=0.,labels={"Stayers","Leavers"})

and yes they are not happy



In [18]:

    
sns.factorplot(x="Satisfaction_level",
               y="Departments",
               hue="Left",
               row="Salary",
               data=df[df.Departments.notnull()],
               kind="box",
               aspect=3,
               palette=colorz3,
               legend=False);

l = plt.legend(handles[0:2], 
               left_labels[0:2], 
               bbox_to_anchor=(1.05, 1), 
               loc=2, 
               borderaxespad=0.)

'Satisfaction level' is a strong indicator on whether an employee will leave or stay.



In [19]:

    
l_sat = df.loc[df["Left"] == 1]["Satisfaction_level"]
s_sat = df.loc[df["Left"] == 0]["Satisfaction_level"]

l_ev = df.loc[df["Left"] == 1]["Last_evaluation"]
s_ev = df.loc[df["Left"] == 0]["Last_evaluation"]

plt.figure(figsize=(14,10))
plt.xlabel("Satisfaction level")
plt.ylabel("Last evaluation")
scat_s = plt.scatter(s_sat, s_ev, color=colorz3[0])
scat_l = plt.scatter(l_sat, l_ev, color=colorz3[1])

l = plt.legend(handles[0:2], 
               left_labels[0:2], 
               bbox_to_anchor=(1.05, 1), 
               loc=2, 
               borderaxespad=0.)

The three separate clusters formed by leavers show that there is no guarantee that an employee will stay based on the sole fact that his superiors are happy with his work rate.

Employers and employees judge each other from different perspectives. 'Last evaluation' metric measures time, effort and work rate of the employees. 'Satisfaction level' metric on the other hand measures how fair the salary and benefits, if any, relative to the effort and time spent in the company.



In [20]:

    
correlation = df.corr()
g=sns.heatmap(correlation, vmax=1, square=True,annot=True,cmap='plasma')
zz1=np.transpose(labelz)
zz2=np.transpose(labelz[::-1])
g.set(xticklabels=zz1);
g.set(yticklabels=zz2);

The correlation plot confirms the observations mentioned earlier. The 'last evaluation' metric is highly correlated to the 'average monthly hours' and the 'number of projects'. The more projects you take and the more hours you work the happier your boss is.

The 'Left' feature is correlated with 'Satisfaction Level', but in a negative way, the lower the satisfaction level, the higher the probability of leaving.

4. Predictive Analysis

Goal: Extraction and classification of main factors causing people to leave the company

4.1 Scale and Split the data



In [21]:

    
df.columns









    Out[21]:





Index(['Satisfaction_level', 'Last_evaluation', 'Number_project',
       'Average_montly_hours', 'Time_spend_company', 'Work_accident', 'Left',
       'Promotion_last_5years', 'Departments', 'Salary'],
      dtype='object')

In order to build the feature matrix., we have to convert categorical values into binaries which will create extra columns indicating the presence or absence of a category with a value of 1 or 0, respectively.



In [22]:

    
df_clf = pd.get_dummies(df)
df_clf.head()









    Out[22]:






  
    
      
      Satisfaction_level
      Last_evaluation
      Number_project
      Average_montly_hours
      Time_spend_company
      Work_accident
      Left
      Promotion_last_5years
      Departments_Accounting
      Departments_HR
      ...
      Departments_Management
      Departments_Marketing
      Departments_Product_mng
      Departments_R&D
      Departments_Sales
      Departments_Support
      Departments_Technical
      Salary_High
      Salary_Low
      Salary_Medium
    
  
  
    
      0
      0.38
      0.53
      2
      157
      3
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
    
    
      1
      0.80
      0.86
      5
      262
      6
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      1
      0
      0
      0
      0
      1
    
    
      2
      0.11
      0.88
      7
      272
      4
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      1
      0
      0
      0
      0
      1
    
    
      3
      0.72
      0.87
      5
      223
      5
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
    
    
      4
      0.37
      0.52
      2
      159
      3
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
    
  

5 rows × 21 columns



In [23]:

    
labels =df_clf.columns
labels









    Out[23]:





Index(['Satisfaction_level', 'Last_evaluation', 'Number_project',
       'Average_montly_hours', 'Time_spend_company', 'Work_accident', 'Left',
       'Promotion_last_5years', 'Departments_Accounting', 'Departments_HR',
       'Departments_IT', 'Departments_Management', 'Departments_Marketing',
       'Departments_Product_mng', 'Departments_R&D', 'Departments_Sales',
       'Departments_Support', 'Departments_Technical', 'Salary_High',
       'Salary_Low', 'Salary_Medium'],
      dtype='object')

The 'Departments' and 'Salary' columns have been expanded into five and three separate columns respectively.

Create the features matrix X and the target vector y.



In [24]:

    
y = df_clf['Left'].values

df_clf = df_clf.drop(['Left'],axis=1)
X = df_clf.values



In [25]:

    
from sklearn.preprocessing import StandardScaler
X= StandardScaler().fit_transform(X)



In [26]:

    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

4.2 Create helpers



In [27]:

    
def separator():
    print(" ")
    print('*********************************************************************')
    print(" ")



In [28]:

    
# function to print classification metrics
def printz(predict_train,predict_test):
    
    print('Performance on Training Data')
    # Accuracy
    print("Accuracy: {0:.4f}".format(metrics.accuracy_score(y_train, predict_train)))
    separator()
    
    print('Performance on Testing Data')
    # training metrics
    print("Accuracy: {0:.4f}".format(metrics.accuracy_score(y_test, predict_test)))
    print("Confusion Matrix")
    separator()
    
    print('Metrics')
    # Note the use of labels for set 1=True to upper left and 0=False to lower right
    print("{0}".format(metrics.confusion_matrix(y_test, predict_test, labels=[1, 0])))
    separator()
    print("Classification Report")
    print(metrics.classification_report(y_test, predict_test, labels=[1,0]))



In [29]:

    
# model fitting
def modelz(model):
    model.fit(X_train, y_train.ravel())
    # predict values using the training data
    predict_train = model.predict(X_train)
    # predict values using the testing data
    predict_test = model.predict(X_test)
    #printz(predict_train,predict_test,y_train,y_test)
    printz(predict_train,predict_test)
    # printz(predict_train,predict_test,y_train,y_test)



In [30]:

    
def classifierz(typz):

    # Instantiate models and set the parameters
    if typz==1:
        # create a Gaussian Naive Bayes model object
        model = GaussianNB()
        modelz(model)
        
    if typz==2:
        # create a LogisticRegression model object
        model =LogisticRegression(C=0.7, random_state=42)
        modelz(model)
        
    if typz==3:       
        # create a Random Forest Classifier model object
        model = rfc(random_state=42)
        modelz(model)
        
    if typz==4:
        model = GradientBoostingClassifier(random_state=0, learning_rate=0.1, max_depth=6)
        modelz(model)
        importances = pd.DataFrame({'feature':df_clf.columns,
                                    'importance':model.feature_importances_})
        importances = importances.sort_values('importance',ascending=False).set_index('feature')
        importances = importances[importances.importance>=0.1]
        importances.index = [x.strip().replace('_', ' ') for x in importances.index]
        print (importances)
        importances.plot(kind = 'barh', x = importances.index, figsize = (10,4),
                         color='#d65f5f', legend=False, title = "Importance factors")

4.3 Train and test models

4.3.1 Naive Bayes



In [31]:

    
clf_id = 1 # 1 to run the Naive bayes classifier 
classifierz(clf_id)









    



Performance on Training Data
Accuracy: 0.6808
 
*********************************************************************
 
Performance on Testing Data
Accuracy: 0.6824
Confusion Matrix
 
*********************************************************************
 
Metrics
[[ 846  179]
 [1250 2225]]
 
*********************************************************************
 
Classification Report
             precision    recall  f1-score   support

          1       0.40      0.83      0.54      1025
          0       0.93      0.64      0.76      3475

avg / total       0.81      0.68      0.71      4500

4.3.2 Logistic Regression



In [32]:

    
clf_id = 2 # 2 to run the logistic regression classifier 
classifierz(clf_id)









    



Performance on Training Data
Accuracy: 0.7898
 
*********************************************************************
 
Performance on Testing Data
Accuracy: 0.8036
Confusion Matrix
 
*********************************************************************
 
Metrics
[[ 390  635]
 [ 249 3226]]
 
*********************************************************************
 
Classification Report
             precision    recall  f1-score   support

          1       0.61      0.38      0.47      1025
          0       0.84      0.93      0.88      3475

avg / total       0.78      0.80      0.79      4500

The logistic regression classifier performed better than the random Bayes, but with a score accuracy still under 0.80. Both models did not capture well the non linearities in the features arising from the heterogeneity of the data.

4.3.3 Random Forest



In [33]:

    
clf_id = 3 # 3 to run the random forest classifier 
classifierz(clf_id)









    



Performance on Training Data
Accuracy: 0.9983
 
*********************************************************************
 
Performance on Testing Data
Accuracy: 0.9856
Confusion Matrix
 
*********************************************************************
 
Metrics
[[ 969   56]
 [   9 3466]]
 
*********************************************************************
 
Classification Report
             precision    recall  f1-score   support

          1       0.99      0.95      0.97      1025
          0       0.98      1.00      0.99      3475

avg / total       0.99      0.99      0.99      4500

4.3.4 Gradient Boosted Regression Trees



In [34]:

    
clf_id = 4 # 4 to run the gradient boosted regression trees
classifierz(clf_id)









    



Performance on Training Data
Accuracy: 0.9947
 
*********************************************************************
 
Performance on Testing Data
Accuracy: 0.9833
Confusion Matrix
 
*********************************************************************
 
Metrics
[[ 967   58]
 [  17 3458]]
 
*********************************************************************
 
Classification Report
             precision    recall  f1-score   support

          1       0.98      0.94      0.96      1025
          0       0.98      1.00      0.99      3475

avg / total       0.98      0.98      0.98      4500

                      importance
Satisfaction level          0.32
Average montly hours        0.18
Last evaluation             0.16

Random forest and gradient boosted algorithms are both ensemble learning algorithms that combine multiple decision trees to create more powerful models. They performed exceptionally well on this dataset. Not only the accuracy but also the precision and the recall metrics are quite high with these models. The three most important features, extracted from the data using the gradient boosted algorithm, are:

Satisfaction level
Average monthly hours
Last evaluation

5. Predict who will leave the company



In [35]:

    
#Predict who will leave with a probability greater than or equal to 50%
rfc = rfc(n_estimators=10)
Mr_x = cross_val_predict(rfc, X, y, cv=5,
                         method='predict_proba')
Mr_x = pd.DataFrame(Mr_x[0:,1])
Mr_x.columns = ['prob_leaving']



In [36]:

    
Mr_x_prob= pd.concat([df, Mr_x], axis=1)
Mr_x_prob= Mr_x_prob[(Mr_x_prob["Left"] == 0)]
Mr_x_prob= Mr_x_prob[(Mr_x_prob["prob_leaving"] >= 0.49)]
Mr_x_prob.sort_values(by='prob_leaving', ascending=False, inplace=True)



In [37]:

    
wl = Mr_x_prob[['Number_project','Average_montly_hours',
                 'Time_spend_company', 'Work_accident',
                 'Salary','prob_leaving']]



In [38]:

    
wl.style.bar(subset=['prob_leaving'], color='#d65f5f')









    Out[38]:





        

        
        

        
            
            
                
                
                
                  
                
                
                
                
                  Number_project
                
                
                
                
                  Average_montly_hours
                
                
                
                
                  Time_spend_company
                
                
                
                
                  Work_accident
                
                
                
                
                  Salary
                
                
                
                
                  prob_leaving
                
                
            
            
        
        
            
            
                
                
                
                    11743
                
                
                
                
                    5
                
                
                
                
                    238
                
                
                
                
                    8
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    11615
                
                
                
                
                    4
                
                
                
                
                    225
                
                
                
                
                    8
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    14191
                
                
                
                
                    4
                
                
                
                
                    223
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    3780
                
                
                
                
                    5
                
                
                
                
                    238
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    9781
                
                
                
                
                    2
                
                
                
                
                    151
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    13954
                
                
                
                
                    5
                
                
                
                
                    238
                
                
                
                
                    8
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    13826
                
                
                
                
                    4
                
                
                
                
                    225
                
                
                
                
                    8
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    6466
                
                
                
                
                    2
                
                
                
                
                    132
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    11980
                
                
                
                
                    4
                
                
                
                
                    223
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    6358
                
                
                
                
                    5
                
                
                
                
                    243
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    5847
                
                
                
                
                    5
                
                
                
                
                    221
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    10098
                
                
                
                
                    5
                
                
                
                
                    266
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    1
                
                
            
            
            
                
                
                
                    7762
                
                
                
                
                    5
                
                
                
                
                    273
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.9
                
                
            
            
            
                
                
                
                    7443
                
                
                
                
                    4
                
                
                
                
                    240
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.9
                
                
            
            
            
                
                
                
                    7077
                
                
                
                
                    4
                
                
                
                
                    279
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.9
                
                
            
            
            
                
                
                
                    11961
                
                
                
                
                    4
                
                
                
                
                    254
                
                
                
                
                    7
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.9
                
                
            
            
            
                
                
                
                    14172
                
                
                
                
                    4
                
                
                
                
                    254
                
                
                
                
                    7
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.9
                
                
            
            
            
                
                
                
                    5994
                
                
                
                
                    4
                
                
                
                
                    255
                
                
                
                
                    5
                
                
                
                
                    1
                
                
                
                
                    Low
                
                
                
                
                    0.8
                
                
            
            
            
                
                
                
                    6263
                
                
                
                
                    4
                
                
                
                
                    231
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.8
                
                
            
            
            
                
                
                
                    7989
                
                
                
                
                    2
                
                
                
                
                    135
                
                
                
                
                    3
                
                
                
                
                    1
                
                
                
                
                    High
                
                
                
                
                    0.8
                
                
            
            
            
                
                
                
                    2415
                
                
                
                
                    4
                
                
                
                
                    238
                
                
                
                
                    5
                
                
                
                
                    1
                
                
                
                
                    Medium
                
                
                
                
                    0.8
                
                
            
            
            
                
                
                
                    11761
                
                
                
                
                    4
                
                
                
                
                    273
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    11996
                
                
                
                
                    5
                
                
                
                
                    266
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    13972
                
                
                
                
                    4
                
                
                
                
                    273
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    7877
                
                
                
                
                    2
                
                
                
                
                    239
                
                
                
                
                    3
                
                
                
                
                    1
                
                
                
                
                    Medium
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    14207
                
                
                
                
                    5
                
                
                
                
                    266
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    6604
                
                
                
                
                    6
                
                
                
                
                    116
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    6742
                
                
                
                
                    4
                
                
                
                
                    147
                
                
                
                
                    2
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    5149
                
                
                
                
                    2
                
                
                
                
                    136
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    4534
                
                
                
                
                    6
                
                
                
                
                    225
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    4519
                
                
                
                
                    2
                
                
                
                
                    124
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.7
                
                
            
            
            
                
                
                
                    7717
                
                
                
                
                    5
                
                
                
                
                    226
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    13844
                
                
                
                
                    4
                
                
                
                
                    269
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    13425
                
                
                
                
                    5
                
                
                
                
                    264
                
                
                
                
                    8
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    5565
                
                
                
                
                    6
                
                
                
                
                    261
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    11633
                
                
                
                
                    4
                
                
                
                
                    269
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    13437
                
                
                
                
                    4
                
                
                
                
                    227
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    11226
                
                
                
                
                    4
                
                
                
                
                    227
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    11214
                
                
                
                
                    5
                
                
                
                
                    264
                
                
                
                
                    8
                
                
                
                
                    0
                
                
                
                
                    High
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    4455
                
                
                
                
                    6
                
                
                
                
                    258
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    10037
                
                
                
                
                    2
                
                
                
                
                    285
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    5773
                
                
                
                
                    6
                
                
                
                
                    129
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    3861
                
                
                
                
                    2
                
                
                
                
                    192
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    9584
                
                
                
                
                    3
                
                
                
                
                    183
                
                
                
                
                    4
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    3056
                
                
                
                
                    2
                
                
                
                
                    144
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    8522
                
                
                
                
                    2
                
                
                
                
                    146
                
                
                
                
                    3
                
                
                
                
                    1
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    6726
                
                
                
                
                    3
                
                
                
                
                    242
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    7818
                
                
                
                
                    3
                
                
                
                
                    279
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.6
                
                
            
            
            
                
                
                
                    13973
                
                
                
                
                    4
                
                
                
                
                    270
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    3921
                
                
                
                
                    2
                
                
                
                
                    157
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    13967
                
                
                
                
                    2
                
                
                
                
                    141
                
                
                
                
                    7
                
                
                
                
                    1
                
                
                
                
                    High
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    13790
                
                
                
                
                    3
                
                
                
                
                    223
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    13722
                
                
                
                
                    5
                
                
                
                
                    216
                
                
                
                
                    10
                
                
                
                
                    1
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    13720
                
                
                
                
                    4
                
                
                
                
                    146
                
                
                
                
                    8
                
                
                
                
                    1
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    3890
                
                
                
                
                    2
                
                
                
                
                    103
                
                
                
                
                    3
                
                
                
                
                    1
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    3594
                
                
                
                
                    5
                
                
                
                
                    232
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    7223
                
                
                
                
                    3
                
                
                
                
                    229
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    11579
                
                
                
                
                    3
                
                
                
                
                    223
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    4892
                
                
                
                
                    2
                
                
                
                
                    112
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    11762
                
                
                
                
                    4
                
                
                
                
                    270
                
                
                
                
                    10
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    11756
                
                
                
                
                    2
                
                
                
                
                    141
                
                
                
                
                    7
                
                
                
                
                    1
                
                
                
                
                    High
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    5697
                
                
                
                
                    6
                
                
                
                
                    196
                
                
                
                
                    2
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    7241
                
                
                
                
                    6
                
                
                
                
                    240
                
                
                
                
                    6
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    11511
                
                
                
                
                    5
                
                
                
                
                    216
                
                
                
                
                    10
                
                
                
                
                    1
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    11509
                
                
                
                
                    4
                
                
                
                
                    146
                
                
                
                
                    8
                
                
                
                
                    1
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    10517
                
                
                
                
                    5
                
                
                
                
                    255
                
                
                
                
                    2
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    9913
                
                
                
                
                    2
                
                
                
                
                    140
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    9582
                
                
                
                
                    2
                
                
                
                
                    123
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    9220
                
                
                
                
                    2
                
                
                
                
                    125
                
                
                
                
                    4
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    6644
                
                
                
                
                    5
                
                
                
                
                    279
                
                
                
                
                    5
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    6854
                
                
                
                
                    2
                
                
                
                
                    109
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    7351
                
                
                
                
                    3
                
                
                
                
                    272
                
                
                
                
                    4
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    7251
                
                
                
                
                    2
                
                
                
                
                    151
                
                
                
                
                    3
                
                
                
                
                    0
                
                
                
                
                    Medium
                
                
                
                
                    0.5
                
                
            
            
            
                
                
                
                    8677
                
                
                
                
                    3
                
                
                
                
                    123
                
                
                
                
                    4
                
                
                
                
                    0
                
                
                
                
                    Low
                
                
                
                
                    0.5

6.Conclusion

By using machine learning algorithms, we were able to extract the main factors related to the intention to leave and the decision to resign, we were also able to calculate the probability of leaving for current employees. This model can be still improved by using chronological data and adding more features like age, gender, family status, distance to work etc.

	Satisfaction_level	Last_evaluation	Number_project	Average_montly_hours	Time_spend_company	Left	Departments	Salary
0	0.38	0.53	2	157	3	1	Sales	Low
1	0.80	0.86	5	262	6	1	Sales	Medium
2	0.11	0.88	7	272	4	1	Sales	Medium
3	0.72	0.87	5	223	5	1	Sales	Low
4	0.37	0.52	2	159	3	1	Sales	Low
5	0.41	0.50	2	153	3	1	Sales	Low
6	0.10	0.77	6	247	4	1	Sales	Low
7	0.92	0.85	5	259	5	1	Sales	Low
8	0.89	1.00	5	224	5	1	Sales	Low
9	0.42	0.53	2	142	3	1	Sales	Low

	Satisfaction_level	Last_evaluation	Number_project	Average_montly_hours	Time_spend_company	Work_accident	Left	Departments	Salary
14989	0.43	0.57	2	159	3	1	1	Technical	Low
14990	0.89	0.88	5	228	5	1	1	Support	Low
14991	0.09	0.81	6	257	4	0	1	Support	Low
14992	0.40	0.48	2	155	3	0	1	Support	Low
14993	0.76	0.83	6	293	6	0	1	Support	Low
14994	0.40	0.57	2	151	3	0	1	Support	Low
14995	0.37	0.48	2	160	3	0	1	Support	Low
14996	0.37	0.53	2	143	3	0	1	Support	Low
14997	0.11	0.96	6	280	4	0	1	Support	Low
14998	0.37	0.52	2	158	3	0	1	Support	Low

	count	mean	std	min	25%	50%	75%	max
Satisfaction_level	14999.00	0.61	0.25	0.09	0.44	0.64	0.82	1.00
Last_evaluation	14999.00	0.72	0.17	0.36	0.56	0.72	0.87	1.00
Number_project	14999.00	3.80	1.23	2.00	3.00	4.00	5.00	7.00
Average_montly_hours	14999.00	201.05	49.94	96.00	156.00	200.00	245.00	310.00
Time_spend_company	14999.00	3.50	1.46	2.00	3.00	3.00	4.00	10.00
Work_accident	14999.00	0.14	0.35	0.00	0.00	0.00	0.00	1.00
Left	14999.00	0.24	0.43	0.00	0.00	0.00	0.00	1.00
Promotion_last_5years	14999.00	0.02	0.14	0.00	0.00	0.00	0.00	1.00

	Number_project	Average_montly_hours	Time_spend_company	Work_accident	Salary	prob_leaving
11743	5	238	8	0	Low	1
11615	4	225	8	0	Low	1
14191	4	223	5	0	Medium	1
3780	5	238	5	0	Low	1
9781	2	151	3	0	Low	1
13954	5	238	8	0	Low	1
13826	4	225	8	0	Low	1
6466	2	132	3	0	Low	1
11980	4	223	5	0	Medium	1
6358	5	243	6	0	Medium	1
5847	5	221	5	0	Low	1
10098	5	266	5	0	Low	1
7762	5	273	6	0	Medium	0.9
7443	4	240	6	0	Medium	0.9
7077	4	279	6	0	Medium	0.9
11961	4	254	7	0	Low	0.9
14172	4	254	7	0	Low	0.9
5994	4	255	5	1	Low	0.8
6263	4	231	5	0	Low	0.8
7989	2	135	3	1	High	0.8
2415	4	238	5	1	Medium	0.8
11761	4	273	10	0	Medium	0.7
11996	5	266	10	0	High	0.7
13972	4	273	10	0	Medium	0.7
7877	2	239	3	1	Medium	0.7
14207	5	266	10	0	High	0.7
6604	6	116	3	0	Low	0.7
6742	4	147	2	0	Low	0.7
5149	2	136	3	0	High	0.7
4534	6	225	6	0	Low	0.7
4519	2	124	3	0	Low	0.7
7717	5	226	3	0	Medium	0.6
13844	4	269	10	0	High	0.6
13425	5	264	8	0	High	0.6
5565	6	261	5	0	Low	0.6
11633	4	269	10	0	High	0.6
13437	4	227	10	0	Low	0.6
11226	4	227	10	0	Low	0.6
11214	5	264	8	0	High	0.6
4455	6	258	5	0	Low	0.6
10037	2	285	5	0	Low	0.6
5773	6	129	5	0	Low	0.6
3861	2	192	3	0	Low	0.6
9584	3	183	4	0	Medium	0.6
3056	2	144	3	0	Low	0.6
8522	2	146	3	1	Low	0.6
6726	3	242	5	0	Low	0.6
7818	3	279	6	0	Medium	0.6
13973	4	270	10	0	Medium	0.5
3921	2	157	3	0	Medium	0.5
13967	2	141	7	1	High	0.5
13790	3	223	10	0	Low	0.5
13722	5	216	10	1	Medium	0.5
13720	4	146	8	1	Low	0.5
3890	2	103	3	1	Medium	0.5
3594	5	232	5	0	Low	0.5
7223	3	229	6	0	Medium	0.5
11579	3	223	10	0	Low	0.5
4892	2	112	3	0	Medium	0.5
11762	4	270	10	0	Medium	0.5
11756	2	141	7	1	High	0.5
5697	6	196	2	0	Low	0.5
7241	6	240	6	0	Low	0.5
11511	5	216	10	1	Medium	0.5
11509	4	146	8	1	Low	0.5
10517	5	255	2	0	Medium	0.5
9913	2	140	3	0	Low	0.5
9582	2	123	3	0	Medium	0.5
9220	2	125	4	0	Medium	0.5
6644	5	279	5	0	Medium	0.5
6854	2	109	3	0	Low	0.5
7351	3	272	4	0	Low	0.5
7251	2	151	3	0	Medium	0.5
8677	3	123	4	0	Low	0.5