ArXiv TLDR

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

🐦 Tweet
2604.24720

Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang

cs.CL

TLDR

This paper classifies sentiment and emotion in Indonesian e-commerce reviews using a multi-task BiLSTM and AutoML, addressing unique language challenges.

Key contributions

  • Tackles complex Indonesian e-commerce reviews with slang, regional words, and emojis.
  • Proposes a two-track pipeline: PyCaret AutoML for standard classifiers and a multi-task BiLSTM.
  • Includes a 14-step preprocessing module with a 140-entry slang dictionary for data cleaning.
  • Benchmarks four BiLSTM configurations and TextCNN, deploying both tracks as Gradio apps.

Why it matters

This research provides robust sentiment and emotion classification for challenging Indonesian e-commerce text, which is crucial for market analysis. Its practical deployment via Gradio apps makes the tools accessible for real-world use.

Original Abstract

Indonesian marketplace reviews mix standard vocabulary with slang, regional loanwords, numeric shorthands, and emoji, making lexicon-based sentiment tools unreliable in practice. This paper describes a two-track classification pipeline applied to the PRDECT-ID dataset, which contains 5,400 product reviews from 29 Indonesian e-commerce categories, each labeled for binary sentiment (Positive/Negative) and five-class emotion (Happy, Sad, Fear, Love, Anger). The first track applies TF-IDF vectorization with a PyCaret AutoML sweep across standard classifiers. The second track is a PyTorch Bidirectional Long Short-Term Memory (BiLSTM) network with a shared encoder and two task-specific output heads. A preprocessing module applies 14 sequential cleaning steps, including a 140-entry slang dictionary assembled from marketplace corpora. Four configurations are benchmarked: BiLSTM Baseline, BiLSTM Improved, BiLSTM Large, and TextCNN. Training uses class-weighted cross-entropy loss, ReduceLROnPlateau scheduling, and early stopping. Both tracks are deployed as Gradio applications on Hugging Face Spaces. Source code is publicly available at https://github.com/ikii-sd/pba2026-crazyrichteam.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.