ArXiv TLDR

A Synthetic Conversational Smishing Dataset for Social Engineering Detection

🐦 Tweet
2604.11752

Carl Lochstampfor, Ayan Roy

cs.CR

TLDR

This paper introduces a synthetic dataset of 3,201 conversational smishing attacks and establishes detection baselines, showing TF-IDF models outperform transformers.

Key contributions

  • Created a synthetic dataset of 3,201 labeled multi-round conversations for conversational smishing detection.
  • Dataset captures diverse attacker strategies and victim responses across multi-stage social engineering interactions.
  • Established detection baselines using 8 models (ML & transformers) with TF-IDF and engineered conversational features.
  • Showed TF-IDF-based models, particularly XGBoost, outperform transformer models for conversational smishing detection.

Why it matters

This paper addresses a critical gap in cybersecurity by providing the first large-scale conversational smishing dataset. It enables research into multi-turn social engineering attacks, which are more realistic and dangerous than single-message threats. The findings also highlight effective detection strategies.

Original Abstract

Smishing (SMS phishing) has become a serious cybersecurity threat, especially for elderly and cyber-unaware individuals, causing financial loss and undermining user trust. Although prior work has focused on detecting smishing at the level of individual messages, real-world attackers often rely on multi-stage social engineering, gradually manipulating victims through extended conversations before attempting to steal sensitive information. Despite the existence of several datasets for single-message smishing detection, datasets capturing conversational smishing remain largely unavailable, limiting research on multi-turn attack detection. To address this gap, this paper presents a synthetically generated dataset of 3,201 labeled multi-round conversations designed to emulate realistic conversational smishing attacks. The dataset reflects diverse attacker strategies and victim responses across multiple stages of interaction. Using this dataset, we establish baseline performance by evaluating eight models, including traditional machine learning approaches (Logistic Regression, Random Forest, Linear SVM, and XGBoost) and transformer-based architectures (DistilBERT and Longformer), with both engineered conversational features and TF-IDF text representations. Experimental results show that TF-IDF-based models consistently outperform those using engineered features alone. The best-performing model, XGBoost with TF-IDF features, achieves 72.5% accuracy and a macro F1 score of 0.691, surpassing both transformer models. Our analysis suggests that transformer performance is limited primarily by input-length constraints and the relatively small size of the training data. Overall, the results highlight the value of lexical signals in conversational smishing detection and demonstrate the usefulness of the proposed dataset for advancing research on defenses against multi-turn social engineering attacks.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.