Customer Dismissal Model

Python Machine Learning

Customer Churn with Logistic Regression

Purpose: To predict when customers of a telecommunications company will leave for a competitor, enabling the company to take action to retain customers.


Data Description


import pandas as pd import numpy as np import pylab as pl import matplotlib.pyplot as plt %matplotlib inline import scipy.optimize as opt from sklearn import preprocessing

Methodology

  1. Data Loading and Preparation:
    • Libraries for data manipulation (pandas, numpy) and visualization (matplotlib) are imported.
    • The churn dataset is loaded into a pandas DataFrame for analysis.
  2. Exploratory Data Analysis:
    • Preliminary data exploration is likely conducted to understand the dataset's structure and to summarize its main characteristics.
  3. Model Development:
    • Logistic regression is used to create a predictive model.
    • The model is trained to identify patterns that indicate potential churn.
  4. Model Evaluation:
    • A confusion matrix is used to evaluate the model's performance.
    • Classification metrics such as precision, recall, f1-score, and support are calculated for each class.
    • The evaluation provides insights into the model's ability to predict churn accurately.

churn_df = data[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip', 'callcard', 'wireless','churn']] churn_df['churn'] = churn_df['churn'].astype('int') churn_df.head()
# The data is numeric, no need for encoding (converting cat to num) # lets normalise the data from sklearn.preprocessing import StandardScaler # Standardize X scalerX = StandardScaler() scalerX.fit(X) X_scaled = scalerX.transform(X) X = X_scaled #print(X_scaled[0:5]) print(X[0:5])

Evaluation Results


The F1 score and log loss are also discussed as performance metrics, with F1 score being the harmonic average of precision and recall, and log loss measuring the performance of a classifier with output probability values between 0 and 1.


import matplotlib.pyplot as plt from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) # Plot confusion matrix and save figure plt.figure(figsize=(8,6)) plt.title('Confusion Matrix for Logistic Regression Model') plt.imshow(cm, cmap='Blues') plt.xticks([0,1], [0,1]) plt.yticks([0,1], [0,1]) plt.ylabel('True Label') plt.xlabel('Predicted Label') plt.savefig('confusion_matrix.png')

Conclusion

The logistic regression model shows reasonable accuracy in predicting customer churn. The model's strengths and limitations are reflected in the precision and recall scores for both classes. The high precision in predicting non-churn customers and the good recall in predicting churn customers suggest the model is useful for the company's objective to retain customers. However, the lower precision for churn predictions indicates room for improvement, possibly through more detailed feature engineering or alternative modeling techniques.


View project