Degree Type

Thesis

Date of Award

2016

Degree Name

Master of Science

Department

Electrical and Computer Engineering

Major

Computer Engineering

First Advisor

Arun R. Somani

Abstract

Customer Churn, an event indicating a customer

abandoning an established relation with a business is an important

problem researched well both in academic and commercial

interest. Through this work, we propose an improved prediction

model that emphasizes on an effective data collection pipeline

through varied channels capturing explicit and implicit customer

footprints. Our goal is to demonstrate how Feature selection

algorithms can improve classifier efficiency. We also rank prominent

features which play a vital role in customer churn. Our

contributions through this paper can be broadly categorized

into 3 folds: First, we show how popular data mining tools in

Hadoop stack help extract several implicit customer interaction

metrics including Sales and Clickstream logs generated as a result

of customer interaction. Second, through Feature Engineering

techniques we verify that some of the new features we propose

have a definite impact on customer churn. Finally, we establish

how Regularized Logistic Regression, SVM and Gradient Boost

Random Forests are the best performing models for predicting

customer churn verified through comprehensive cross-validation

techniques.

DOI

https://doi.org/10.31274/etd-180810-5650

Copyright Owner

Karthik B. Subramanya

Language

en

File Format

application/pdf

File Size

57 pages

Share

COinS