Self-Supervised Customer Representation Learning for Segmentation and Next-Purchase Prediction on UCI Online Retail

Authors

  • Qi Xin University of Pittsburgh

DOI:

https://doi.org/10.32664/j-intech.v14i01.2229

Keywords:

Customer Representation, Customer Segmentation, Next-Purchase Prediction, RFM, Self-Supervised Learning

Abstract

Customer analytics in financial retail, payments, and bank marketing frequently relies on segmentation and propensity prediction, but transactional logs are sparse, high-dimensional, and only weakly labeled. This paper presents a fast and reproducible self-supervised learning pipeline that converts raw e-commerce transactions into customer representations and evaluates them on two downstream tasks: customer segmentation and next-purchase prediction. We conduct full experimental evaluation on the UCI Online Retail dataset (541,909 invoice-line transactions from 2010-12-01 to 2011-12-09). After deterministic cleaning (removing cancellations and non-positive prices/quantities), 397,884 valid line items remain, spanning 4,338 customers, 18,532 invoices, 3,665 products, and 37 countries. For each customer we construct an ordered invoice sequence and define a canonical item per invoice (the item with the largest aggregated quantity). For each invoice transition we build a dual-view customer state vector that concatenates a lifetime purchase count view and a recent-window view (30 days), then learn embeddings via TF-IDF reweighting and truncated SVD. To increase robustness we introduce a denoising ridge projection (DRP) objective: a linear denoising model trained to map corrupted TF-IDF state vectors back to clean SVD embeddings without using labels, which yields denoised customer embeddings for downstream models. Our main contribution is an applied, computationally light integration of TF-IDF+SVD embeddings with a denoising linear projection for reuse across segmentation and next-purchase prediction, rather than a fundamentally new learning paradigm. In next-purchase prediction restricted to the 200 most frequent target items, a multinomial logistic model trained on DualDRP embeddings achieves Hit@20=0.587, outperforming MostPopular (Hit@20=0.327) and Markov (Hit@20=0.291). In segmentation we apply k-means clustering and analyze cluster-level RFM statistics and dominant products, showing that the learned embeddings recover actionable segments such as high-value frequent buyers and low-activity long-tail customers. All results, tables, and figures are generated with fixed random seeds and are reproducible in this environment

References

[1] D. Chen, S. L. Sain, and K. Guo, “Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining,” Journal of Database Marketing & Customer Strategy Management, vol. 19, no. 3, pp. 197–208, 2012, doi: 10.1057/dbm.2012.17.

[2] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50.

[3] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9729–9738. doi: 10.1109/CVPR42600.2020.00975.

[4] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer (Long. Beach. Calif)., vol. 42, no. 8, pp. 30–37, 2009, doi: 10.1109/MC.2009.263.

[5] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-based recommendations with recurrent neural networks,” in International Conference on Learning Representations (ICLR), 2016.

[6] C. C. Aggarwal, Recommender Systems: The Textbook. Springer, 2016. doi: 10.1007/978-3-319-29659-3.

[7] UCI Machine Learning Repository, “Online Retail Dataset.” [Online]. Available: https://archive.ics.uci.edu/dataset/352/online+retail

[8] S. Suhada, S. Bahri, S. B. Nugraha, T. Hidayatulloh, and D. Wintana, “Product Recommendation System Using User-Based Collaborative Screening Methods In Digital Marketing,” J-INTECH, vol. 11, no. 1, 2023, doi: 10.32664/j-intech.v11i1.866.

[9] D. P. Dewi, I. H. Santi, and W. D. Puspitasari, “Perhitungan Penilaian Tingkat Kepuasan Pelanggan Dengan Menerapkan Algoritma K-Means,” J-Intech, vol. 11, no. 2, pp. 257–265, 2023, doi: 10.32664/j-intech.v11i2.981.

[10] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Inf. Process. Manag., vol. 24, no. 5, pp. 513–523, 1988, doi: 10.1016/0306-4573(88)90021-0.

[11] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in International Conference on Machine Learning (ICML), 2008, pp. 1096–1103. doi: 10.1145/1390156.1390294.

[12] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “BPR: Bayesian personalized ranking from implicit feedback,” in Conference on Uncertainty in Artificial Intelligence (UAI), 2009, pp. 452–461.

[13] A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.

[14] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970, doi: 10.1080/00401706.1970.10488634.

[15] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall, 1993. doi: 10.1007/978-1-4899-4541-9.

[16] J. Urbano, M. Marrero, and D. Martín, “Statistical Significance Testing in Information Retrieval,” arXiv preprint arXiv:1905.11096, 2019.

[17] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297.

[18] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, pp. 53–65, 1987, doi: 10.1016/0377-0427(87)90125-7.

[19] C. Hennig, “Cluster-wise assessment of cluster stability,” Comput. Stat. Data Anal., vol. 52, no. 1, pp. 258–271, 2007, doi: 10.1016/j.csda.2006.11.025.

[20] T. Liu, “Stability estimation for unsupervised clustering: A review,” WIREs Computational Statistics, 2022, doi: 10.1002/wics.1575.

[21] M. Chen, Z. Xu, K. Weinberger, and F. Sha, “Marginalized denoising autoencoders for domain adaptation,” in International Conference on Machine Learning (ICML), 2012.

[22] F. et al. Pedregosa, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[23] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/MCSE.2007.55.

Downloads

Published

2026-04-08