Segmentation and Prediction of Store Performance on the Shopee Marketplace Using a Hybrid Clustering Approach, Spatial Analysis, and Feature Importance
DOI:
https://doi.org/10.32664/j-intech.v14i01.2256Keywords:
Data Mining, e-commerce, K-Means Clustering, random forest, marketplace analyticsAbstract
Marketplace platforms have become a central component of digital commerce, particularly in Southeast Asia where Shopee has emerged as one of the dominant e-commerce ecosystems. This study aims to analyze and predict the performance of Shopee stores using a hybrid data mining approach integrating clustering, spatial analysis, and classification. The dataset consists of 655 Shopee stores collected on February 18, 2026. K-Means clustering is applied to segment store performance, while spatial analysis examines geographic distribution patterns. Furthermore, a Random Forest classifier is used to predict performance categories and identify influential features. The clustering results reveal three distinct store performance groups with a Silhouette Score of 0.6704, indicating a good cluster structure. Although K = 2 produced a higher score (0.7891), K = 3 was selected to provide more meaningful segmentation (low, medium, and high performance). The Random Forest model achieved an accuracy of 79%, with precision, recall, and F1-score demonstrating reliable predictive performance across all classes. Feature importance analysis shows that promotional activity, chat responsiveness, and follower count significantly influence store performance classification. Spatial analysis indicates that provinces such as West Java and Jakarta dominate high-performance clusters. The findings contribute to hybrid data mining frameworks and provide practical insights for improving seller competitiveness in digital commerce ecosystems.
References
[1] Reuters, “Google, Shopee-owner Sea to develop AI tools for e-commerce, gaming,” www.reuters.com, 2026.
[2] B. Yáñez-Araque, J. P. S.-I. Hernández, S. Gutiérrez-Broncano, and P. Jiménez-Estévez, “Corporate social responsibility in micro-, small- and medium-sized enterprises: Multigroup analysis of family vs. nonfamily firms,” J. Bus. Res., vol. 124, pp. 581–592, 2021, doi: https://doi.org/10.1016/j.jbusres.2020.10.023.
[3] H. Li et al., “Flash Flood Risk Classification Using GIS-Based Fractional Order k -Means Clustering Method,” MDPI Fractal Fract. J., vol. 9, pp. 1–18, 2025, doi: https://doi.org/10.3390/fractalfract9090586.
[4] Y. K. Dwivedi, N. Kshetri, L. Hughes, E. Slade, and A. Jeyaraj, “Opinion Paper: ‘So what if ChatGPT wrote it?’ Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy,” Int. J. Inf. Manage., vol. 71, pp. 1–63, 2023, doi: https://doi.org/10.1016/j.ijinfomgt.2023.102642.
[5] S. R. Rabani, D. Amalia, M. Erika, N. Kusuma, and F. Ilayana, “Pengaruh Penggunaan AI , Literasi Digital , dan Pengalaman Pengguna Terhadap Loyalitas Pelanggan Pada E-Commerce Shopee,” J. Econ. Bus. Res., vol. 3, no. 2, pp. 147–159, 2024, doi: https://doi.org/10.22515/juebir.v3i2.10813.
[6] A. M. Alghaniy, “The Impact of Artificial Intelligence Technology in Shopee’s Chatbot Service on Customer Satisfaction in Greater Bandung Area, Indonesia,” Int. J. Adm. Bus. Organ., vol. 5, no. 1, pp. 48–55, 2024, doi: https://doi.org/10.61242/ijabo.24.337.
[7] A. Muslikhun and S. Sutopo, “Analisis Faktor-Faktor yang Mempengaruhi Keputusan Pembelian Online di Marketplace Shopee,” J. Transform. Bisnis Digit., vol. 1, no. 4, pp. 11–24, 2024, doi: https://doi.org/10.61132/jutrabidi.v1i4.202.
[8] P. Bicen, S. Hunt, and S. Madhavaram, “Coopetitive innovation alliance performance: Alliance competence, alliance’s market orientation, and relational governance,” J. Bus. Res., vol. 123, pp. 23–31, 2021, doi: https://doi.org/10.1016/j.jbusres.2020.09.040.
[9] Y. A. Wijaya and D. Sudrajat, “Analisis Bibliometrik: Pemetaan Penelitian Machine Learning dalam E-commerce Berdasarkan Data dari Scopus (2019-2024),” in Prosiding Seminar Nasional Sisfotek (Sistem Informasi dan Teknologi Informasi), 2024, pp. 451–461.
[10] A. Shojaei, “Data Mining Systematic Literature Review,” 2024. doi: https://doi.org/10.13140/RG.2.2.14684.40324.
[11] L. A. Putri, M. Tsaqofah, D. S. Hasibuan, H. Fadillah, M. Ulfa, and M. Furqan, “Application of K-Means Clustering Algorithm for E- Commerce Data Analysis,” J. Artif. Intell. Eng. Appl., vol. 4, no. 3, pp. 5–8, 2025, doi: https://doi.org/10.59934/jaiea.v4i3.1170.
[12] A. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, 2010, doi: https://doi.org/10.1016/j.patrec.2009.09.011.
[13] B. Neupane et al., “Machine learning algorithms for supporting life cycle assessment studies: An analytical review,” Sustain. Prod. Consum., vol. 56, pp. 37–53, 2025, doi: https://doi.org/10.1016/j.spc.2025.03.015.
[14] R. Siagian, P. Sirait, and A. Halima, “E-Commerce Customer Segmentation Using K-Means Algorithm and Length, Recency, Frequency, Monetary Model,” JITE (Journal Informatics Telecommun. Eng. Available, vol. 5, no. 1, pp. 21–30, 2021, doi: 10.31289/jite.v5i1.5182 Received:
[15] F. Muttaqien, N. Fitria, V. L. Rizki, and I. Abrori, “Pengaruh Kompetensi, Program Diklat, Dan Motivasi Kerja Terhadap Peningkatan Kinerja Karyawan Pt. Bpr Nur Semesta Indah Kabupaten Jember,” J. Istiqro, vol. 11, no. 2, pp. 107–123, 2025, doi: 10.30739/istiqro.v11i2.4119.
[16] S. Wahyuni, T. T. Wulansari, and F. Fahrullah, “Segmentasi Pelanggan Berdasarkan Analisis Recency, Frequency, Monetary Menggunakan Algoritma K-Means Pada CV. Toedjoe Sinar Group,” J. Rekayasa Teknol. Inf., vol. 7, no. 2, pp. 180–187, 2023, doi: http://dx.doi.org/10.30872/jurti.v7i2.8748.
[17] R. Setyawan and B. Murtiyasa, “A Systematic Literature Review of Clustering Algorithms in Stock Market Analysis,” J. Comput. Networks, Archit. High Perform. Comput., vol. 08, no. 1, pp. 36–52, 2026, doi: https://doi.org/10.47709/cnahpc.v8i1.7333.
[18] T. A. N. Azzikra, “Segmentasi Wilayah Digitalisasi di Indonesia dengan DBSCAN dan Validasi menggunakan Random Forest,” Digit. Transform. Technol., vol. 5, no. 2, pp. 85–91, 2025, doi: https://doi.org/10.47709/digitech.v5i2.6532.
[19] A. Khairunnisa, K. A. Notodiputro, and B. Sartono, “A Comparative Study of Random Forest and Double Random Forest Models from View Points of Their Interpretability,” Sci. J. Informatics, vol. 11, no. 1, pp. 207–218, 2024, doi: 10.15294/sji.v11i1.48721.
[20] J. Ipmawati and K. Kusnawi, “Integration of K-Means Clustering, Random Forest, and RFM Analysis for Optimizing Consumer Segmentation in Digital Advertising Strategies,” J. SISFOKOM (Sistem Inf. dan Komputer), vol. 15, no. 1, pp. 112–118, 2026, doi: 10.32736/sisfokom.v15i1.2548.
[21] B. N. Yulisasih, H. Herman, S. Sunardi, and H. Yuliansyah, “Predictive Analytics on Shopee for Optimizing Product Demand Prediction through K-Means Clustering and KNN Algorithm Fusion,” Journal of Information Systems and Informatics,” Ilk. J. Ilm., vol. 16, no. 3, pp. 330–342, 2024, doi: https://doi.org/10.33096/ilkom.v16i3.2325.330-342.
[22] M. Febima and L. Magdalena, “Predictive Analytics on Shopee for Optimizing Product Demand Prediction through K-Means Clustering and KNN Algorithm Fusion,” J. Inf. Syst. Informatics, vol. 6, no. 2, pp. 751–765, 2024, doi: 10.51519/journalisi.v6i2.720.
[23] K. Tabianan, S. Velu, and V. Ravi, “K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data,” MDPI Sustain., vol. 14, pp. 1–15, 2022, doi: https://doi.org/10.3390/su14127243.
[24] Z. R. Li, “Customer Segmentationand Churn Prediction Based On K-Means And Random Forest: Acase Study Of E-Commerce Data,” Eurasia J. Sci. Technol., vol. 7, no. 2, pp. 14–19, 2025.
[25] U. I. Hartanto, I. G. P. A. Buditjahjanto, and W. Yustanti, “Hybrid Clustering and Classification of At-Risk Customer Segments in Network Marketing,” J. Inf. Eng. Educ. Technol., vol. 9, no. 1, pp. 42–50, 2025.
[26] M. Ali and M. Hussain, “Machine Learning-Based Customer Churn Prediction for E-Commerce Businesses,” Preprint, pp. 1–8, 2025, doi: 10.20944/preprints202511.0735.v1
Downloads
Published
Issue
Section
License
Copyright (c) 2026 J-INTECH

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

