Degree Type

Creative Component

Semester of Graduation

Fall 2019


Theses & dissertations (College of Business)

First Major Professor

Antony M Townsend


Master of Science (MS)


Information Systems


In the modern world, the e-commerce market is growing rapidly. The platform for shopping is shifting from retail stores to the online shopping applications. People are inclined towards sitting in their homes and purchasing products ranging from electronics, clothing to even groceries rather than going to malls or shopping centers. Along with the new products, the growth of online shopping has given users an opportunity to buy and sell used products easily like never before. One major issue in the used product sector compared to the new products sector is the pricing of used products. First time sellers in online marketplaces are finding it difficult to sell their products. The growth of online marketplaces has triggered the interest in building algorithms for price suggestions to the sellers. Also, the growth of machine learning and big data in predictive modeling has raised the importance of using them to improve the pricing in Used products business. By obtaining a solution for price prediction via product features for B2C and C2C online retailers, it will be easier for sellers to sell, and enlarge the selling-shopping community of such user-based marketplaces. It could also be a great competitive advantage for companies or individual sellers having highly accurate pricing decision-support. This is the motive for me to build the price prediction model. For this project. I have used a dataset provided by Mercari for price suggestion challenge. It contains 1.5 million records with product name, brand name, condition, shipping status and description. I have performed exploratory data analysis to understand the data and I have used modern analytics techniques like Label binarizer, feature extraction to cleanse and prepare the data for model. I have built a LightGBM regression model to predict the price of the product based on the categorical and textual features. I have tested my model against metrics like RSMLE and R square and I was able to achieve the best performance of 0.45 RSMLE for LightGBM model with learning rate 0.75

Copyright Owner

Chada, Akshay Reddy

File Format