Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

April 27, 20262604.24720

Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang

cs.CL

TLDR

This paper classifies sentiment and emotion in Indonesian e-commerce reviews using a multi-task BiLSTM and AutoML, addressing unique language challenges.

Key contributions

Tackles complex Indonesian e-commerce reviews with slang, regional words, and emojis.
Proposes a two-track pipeline: PyCaret AutoML for standard classifiers and a multi-task BiLSTM.
Includes a 14-step preprocessing module with a 140-entry slang dictionary for data cleaning.
Benchmarks four BiLSTM configurations and TextCNN, deploying both tracks as Gradio apps.

Why it matters

This research provides robust sentiment and emotion classification for challenging Indonesian e-commerce text, which is crucial for market analysis. Its practical deployment via Gradio apps makes the tools accessible for real-world use.

Original Abstract

Indonesian marketplace reviews mix standard vocabulary with slang, regional loanwords, numeric shorthands, and emoji, making lexicon-based sentiment tools unreliable in practice. This paper describes a two-track classification pipeline applied to the PRDECT-ID dataset, which contains 5,400 product reviews from 29 Indonesian e-commerce categories, each labeled for binary sentiment (Positive/Negative) and five-class emotion (Happy, Sad, Fear, Love, Anger). The first track applies TF-IDF vectorization with a PyCaret AutoML sweep across standard classifiers. The second track is a PyTorch Bidirectional Long Short-Term Memory (BiLSTM) network with a shared encoder and two task-specific output heads. A preprocessing module applies 14 sequential cleaning steps, including a 140-entry slang dictionary assembled from marketplace corpora. Four configurations are benchmarked: BiLSTM Baseline, BiLSTM Improved, BiLSTM Large, and TextCNN. Training uses class-weighted cross-entropy loss, ReduceLROnPlateau scheduling, and early stopping. Both tracks are deployed as Gradio applications on Hugging Face Spaces. Source code is publicly available at https://github.com/ikii-sd/pba2026-crazyrichteam.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers