SMS Fraud Detection

AIMS AND OBJECTIVES

To clean, improve, and organise an existing subset of Chichewa SMS messages for use in building classification models.
To develop and evaluate machine learning classification algorithms to determine their effectiveness in classifying Chichewa SMS messages as fraudulent or normal using the prepared dataset.
To collect large SMS data to support further experimentation and analysis.

LARGE DATA COLLECTION - (Feb - Mar, 2024)

Additional SMS data collection exercise was conducted using two methodologies: online surveys and face-to-face questionnaires.

A structured online google form survey was distributed through various digital platforms at MUBAS via the Students Union president. Posters were displayed on the campus at MUBAS calling for participation. A total of 102 people participated in the online survey.
Face-to-face data collection was conducted with individuals selected through random sampling techniques around the campus. A total of 86 students and members of staff participated in the face-to-face data collection.

KEY FINDINGS FROM COLLECTED DATA

These findings come from a small population thus MUBAS students community and few other individuals outside MUBAS community therefore, they may not be representative enough for the entire population.

Awareness of fraudulent SMSs: Online Surveys - 96.1% respondents reported that they are aware of the existence of fraud SMSs, and 3.9% reported they do not know of its existence while Face-to-Face data collection- all 86 individuals reported that they are aware of fraudulent SMS representing 100%.
Prevalence of Fraudulent SMSs: Online Surveys-2% of respondents reported receiving fraudulent SMS messages of more than 10 a month, 16.4% reported receiving between 5 to 10 SMSs a month, and 81.4% encountering such messages occasionally while Face-to-Face Interviews- participants expressed varying degrees of exposure to fraudulent SMS messages, with 12% reporting frequent encounters and 88% stating rare occurrences.
Types of Fraudulent SMS Scams: Both online survey and face-to-face data collection highlighted common types of fraudulent SMS scams, including phishing attempts, prize/sweepstakes scams, fake investment opportunities, and impersonation scams. Participants shared similar experiences regarding the types of scams encountered, indicating consistency in the prevalence of fraudulent SMS messages.
Response strategies: Online Surveys-72% of respondents stated they ignored fraudulent SMS messages, and 28% deleted the messages immediately while Face-to-Face Interviews-participants mentioned diverse response strategies, including ignoring and deleting the messages, reporting the messages to relevant authorities or engaging with the sender to gather more information.
Impact of fraudulent SMSs: Both data collection methods highlighted the potential financial and emotional impact of falling victim to fraudulent SMS scams, with respondents expressing concerns about identity theft, financial losses, and personal data compromise. 90.2% admitted that they or a friend had ever fallen victim, while 9.8% did not.

LARGE DATASET

Some participants in the study consented to and submitted sample SMS data. We developed a dataset of 15,299 SMS messages. We categorized these as FRAUD, NORMAL and SPAM of which 1,370 are fraudulent, 1,826 are spam and 12,033 are normal. Once cleaned and analysed, the machine learning models will be trained to test their performance for SMS classification. The results are expected to provide further insights.

RESULTS OF ML EXPERIMENTS PERFORMED ON THE SUBSET

A subset of 646 SMS messages, both fraudulent and normal, was used to experiment with machine learning algorithms to build a classifier. Random Forest model achieved an accuracy of 97% while Logistic Regression achieved 96%. The subset was translated to English by human translators and machine translator(Google Translate). The results show that Logistic Regression achieved the highest accuracy of 96% while Random Forest came second with 95%. Both models' performance declined with machine translated dataset. Overall, the results are promising and they have shown that machine learning can potentially be used to flag out fraudulent SMSs in the local language.

SMS FRAUD AWARENESS DAY

The event was held at the Malawi University of Business and Applied Sciences(MUBAS), ODeL building on 10th May 2024 from 14:00 PM to 16:00 pm. The event aimed at raising awareness on the existence of SMS fraud, how people can identify fraudulent SMS messages to protect themselves and how machine learning can be used to help combat the SMS fraud in Malawi. On the machine learning significance to combat SMS fraud, we highlighted our recent research findings from the research project titled “SMS Fraud Detection Using Machine Learning in Malawi” The research aimed at investigating the potential of using machine learning to classify Chichewa SMS messages as fraudulent and non-fraudulent. In attendance were the students from MUBAS who were involved in the SMS data collection, members of staff, MSU Director of Publicity and Publications and the invited guests from TNM and Inq. Read more