ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering
MD Awsaf Alam Anindya, Showvik Biswas, Anindya Iqbal, Jaydeb Sarker, Amiangshu Bosu
TLDR
ToxiShield is a GitHub browser extension that uses AI to detect, categorize, and reframe toxic code review comments in real-time, fostering inclusive communication.
Key contributions
- ToxiShield is a browser extension for GitHub pull requests, offering real-time toxicity filtering.
- Utilizes a BERT-based model for 98% accurate binary toxicity detection in code review comments.
- Features a Claude 3.5 Sonnet-powered coach for fine-grained toxicity categorization with explanations.
- Includes a fine-tuned Llama 3.2 Reframer to generate constructive alternatives for toxic text.
Why it matters
Toxic interactions undermine teamwork and productivity in software engineering. ToxiShield provides a much-needed real-time solution to promote healthier, more constructive communication during code reviews. This tool sets a new standard for fostering inclusivity and collaboration in open-source communities.
Original Abstract
Toxic interactions during code reviews can undermine teamwork and hinder productivity in software engineering (SE) teams. While prior studies explore toxicity detection and empirical investigation, they lack real-time detoxification tools to support the SE community. To address this gap, we present ToxiShield, a browser extension for GitHub pull requests that is built using three modules: i) Toxicity Filter -- to identify whether a text is toxic, ii) Communication coach -- to facilitate just-in-time fine-grained toxicity categorization with explanations, and iii) The Reframer -- that generates a revised, constructive alternative of a toxic text. For each module, we trained and evaluated multiple deep learning and Large Language Models (LLMs) to identify the best choice. A BERT-based binary detection model, trained on 38,761 code review samples, achieves 98% accuracy and an F1-score of 97% and is the selected one for the Toxicity Filter module. For the Communication Coach, prompt-tuned Claude 3.5 Sonnet achieved the best performance with 39% MCC and 42% F1 in multiclass toxicity classification with detailed reasoning. For Reframer, we evaluated five LLMs using a fine-tuning strategy on a dataset of 10,120 code review comments. The fine-tuned Llama 3.2 model achieves 95.27% style transfer accuracy, 97.03% fluency, 67.07% content preservation, and an 84% J-score. We further validated ToxiShield through a human evaluation using the Technology Acceptance Model with 10 participants, confirming its perceived usefulness and ease of adoption. ToxiShield sets a benchmark for advancing constructive communication in software engineering, driving inclusivity and healthier collaboration in open-source communities.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.