ArXiv TLDR

Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

🐦 Tweet
2604.08352

Nicolás E. Díaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux, Nalin Arachchilage, Riccardo Scandariato

cs.SEcs.CRcs.HC

TLDR

This paper analyzes developer discussions on GitHub Copilot, revealing four key security concerns: data leakage, licensing, adversarial attacks, and insecure code.

Key contributions

  • Analyzed developer discussions on GitHub Copilot across Stack Overflow, Reddit, and Hacker News.
  • Identified four major security concerns: data leakage, code licensing, adversarial attacks, and insecure code suggestions.
  • Used BERTopic and thematic analysis to categorize and synthesize developer feedback.

Why it matters

This paper provides crucial insights into developer perceptions of security risks in GenAI coding assistants like GitHub Copilot. It highlights key areas for improving built-in security features, informing future development and policy.

Original Abstract

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.