| Home

Overview


Original Research

TO BUY OR NOT TO BUY: COMPARISON OF MACHINE LEARNING TECHNIQUES TO PREDICT ONLINE SHOPPING PREFERENCE OF CUSTOMERS

DAKSH KAPOOR 1, ACHIRANGSHU CHAKRABORTY 2, and SUNITA DANIEL 3.

Vol 18, No 02 ( 2023 )   |  DOI: 10.17605/OSF.IO/8CKA3   |   Author Affiliation: Lal Bahadur Shastri Institute of Management, Plot 11/7, Dwarka, Sector-11, New Delhi, India 1,2,3.   |   Licensing: CC 4.0   |   Pg no: 1141-1150   |   Published on: 20-02-2023

Abstract

In the last decade, and especially since 2020, online purchasing has become ubiquitous. Most customers prefer shopping online over visiting physical stores since it is more convenient and easier to shop from the comfort of their own home or workplace. On the other hand, the advantage of physical stores where products can be handled or tried on before purchasing is undeniable. Although it might be easy to determine the preferences and intentions of consumers who visit physical stores to make their purchases, it is harder to decipher the intentions and behavioral patterns of online shoppers, especially in large marketplaces that bring together a variety of products and sellers. This study has aimed to classify customers using machine learning techniques based on whether they complete a purchase using various browsing parameters and other dimensions. The analysis was carried out using secondary data obtained from Kaggle Machine Learning Repository. Bagging and boosting algorithms were used to predict purchasing intention of online shoppers. Since the dataset was highly unbalanced multiple techniques had to be used to balance them. It was found that the month of May had the highest revenue, and also the maximum number of customers making repeated visits to the website. Moreover, the month of May also had the maximum number of special days. The Gradient Boosting algorithm gave the highest accuracy in prediction of consumer behavior, however when Up sampling was performed, Light GBM gave the highest accuracy and for Down sampling, Random Forest gave the highest accuracy in prediction.


Keywords

Machine Learning, Exploratory Data Analysis, SMOTE, Near Miss Algorithm, Bagging, Boosting