Article Info
The Random Forest Algorithm for Modelling the Overspending Behaviour of Malaysian Households Income Class
Liyana Ihkwani Abdul Latif, Azuraliza Abu Bakar, Zulaiha Ali Othman, Mohd Suhaidi Abdul Rais, Mazniha Berahim
Abstract
Overspending is a typical financial behaviour that can affect individuals across all income levels, but it tends to impact those with lower incomes significantly. 91视频 has shown that low-income individuals are more likely to experience financial hardship as a result of overspending. Previous studies in socio-economic analytics have demonstrated the potential of machine learning as a predictive model. This study proposed the use of the Random Forest method to build a predictive model of overspending behaviour among Malaysian households in the B40, M40, and T20 income groups. The model was developed using the household income and expenditure data from the survey conducted by the Department of Statistics Malaysia (DOSM) in 2016. The original dataset comprises three databases containing 1.5 million records of head and household members. These databases were integrated into a single dataset with 14,551 household records and 25 parameters, including 13 demographic factors and 12 categories of household expenditure. The Random Forest algorithm achieved the highest accuracy compared to other well-known machine learning methods. Its predictive attributes were compared with the household expenditure reports from DOSM for 2016, 2019 and 2022. The overspending attributes identified from the 2016 data were consistent with expenditure patterns in 2019 and 2022, suggesting that the proposed model can effectively predict future spending items. This study provides valuable insights into household spending and overspending behaviour and highlights the potential for further research in socio-economic analytics.
keyword
financial behaviour, feature selection, random forest, classification, overspending
Area
Data Mining and Optimization

