Customer Segmentation using KNN by Raji Kudus Adewale

K Nearest Neighbor

Objective: To build a KNN classifier to predict the classification of unknown cases within the customer base of a telecommunications provider.

Data Description

Segmentation: The customer base is segmented into four groups based on service usage patterns.
Target Variable: The custcat field, which includes four values corresponding to the customer groups:
- Basic Service
- E-Service
- Plus Service
- Total Service


import itertools
import numpy as np
import matplotlib.pyplot as plt
# from matplotlib.ticker import NullFormatter
import pandas as pd
import numpy as np
# import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline

Methodology

Data Preparation:
- Import relevant libraries for data manipulation (pandas, numpy) and visualization (matplotlib).
- Load the customer data from a CSV file into a pandas DataFrame.
- Perform initial data exploration to understand the dataset's structure.
Exploratory Data Analysis:
- The notebook likely contains statistical summaries and visualizations to explore the customer data and understand the distribution across different segments.
Model Development:
- Implement the KNN algorithm to create a predictive model.
- Configure the model to identify the nearest neighbors and classify the customers accordingly.


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)


# Calculating the accuracy for different Ks
from sklearn.metrics import accuracy_score


accuracies = []

for k in range(1,11):
    knn = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    
    #Calc. accuracy and store 
    accuracies.append(accuracy_score(y_test, ypred))
    
print (accuracies)