Objective: To build a KNN classifier to predict the classification of unknown cases within the customer base of a telecommunications provider.
Data Description
Segmentation: The customer base is segmented into four groups based on service usage patterns.
Target Variable: The custcat field, which includes four values corresponding to the customer groups:
Basic Service
E-Service
Plus Service
Total Service
import itertools
import numpy as np
import matplotlib.pyplot as plt
# from matplotlib.ticker import NullFormatter
import pandas as pd
import numpy as np
# import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline
Methodology
Data Preparation:
Import relevant libraries for data manipulation (pandas, numpy) and visualization (matplotlib).
Load the customer data from a CSV file into a pandas DataFrame.
Perform initial data exploration to understand the dataset's structure.
Exploratory Data Analysis:
The notebook likely contains statistical summaries and visualizations to explore the customer data and understand the distribution across different segments.
Model Development:
Implement the KNN algorithm to create a predictive model.
Configure the model to identify the nearest neighbors and classify the customers accordingly.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)
print ('Train set:', X_train.shape, y_train.shape)
print ('Test set:', X_test.shape, y_test.shape)
# Calculating the accuracy for different Ks
from sklearn.metrics import accuracy_score
accuracies = []
for k in range(1,11):
knn = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)
y_pred = knn.predict(X_test)
#Calc. accuracy and store
accuracies.append(accuracy_score(y_test, ypred))
print (accuracies)