Software Engineering
Papers on code generation, software testing, development tools, and AI for SE.
cs.SE ยท 497 papersSecuring the Dark Matter: A Semantic-Enhanced Neuro-Symbolic Framework for Supply Chain Analysis of Opaque Industrial Software
This paper introduces a neuro-symbolic framework that analyzes opaque industrial software binaries to detect vulnerabilities and supply chain risks.
SARC: A Governance-by-Architecture Framework for Agentic AI Systems
SARC is a runtime governance framework enforcing hard constraints in agentic AI systems for safer, auditable execution.
The AI-Native Large-Scale Agile Software Development Manifesto
This paper introduces an AI-Native Large-Scale Agile Software Development Manifesto to redefine large-scale agile using AI as a first-class participant.
SafeTune: Search-based Harmfulness Minimisation for Large Language Models
SafeTune is a search-based method that significantly reduces harmfulness and increases relevance in LLM responses through hyperparameter tuning and prompt engineering.
Characterizing and Mitigating False-Positive Bug Reports in the Linux Kernel
This paper characterizes false-positive bug reports in the Linux kernel and proposes LLM-based mitigation, showing they waste significant developer effort.
System Test Generation for Virtual Reality Applications using Scenario Models
UltraInstinctVR automates system test generation for VR applications using scenario models, outperforming existing tools in bug detection.
Search-based Robustness Testing of Laptop Refurbishing Robotic Software
PROBE is a search-based method for robustly testing object detection models in laptop refurbishing robots, significantly outperforming random search.
Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation
Q-SAGE evaluates LLMs for quantum solver generation, showing iterative refinement improves success but reveals numerical accuracy as a key limitation.
MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals
MASPrism uses SLM prefill-stage signals for lightweight, fast, and accurate failure attribution in multi-agent systems, outperforming larger LLMs.
Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study
This study evaluates prompt engineering strategies for LLM-based qualitative coding of psychological safety, finding multi-shot improves Claude Haiku.
Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair
A multi-stage LLM training framework with iterative error repair significantly improves Java-to-Cangjie code translation, boosting functional equivalence.
Exploring CoCo Challenges in ML Engineering Teams: Insights From the Semiconductor Industry
This paper explores collaboration and communication challenges in ML engineering teams within the semiconductor industry, identifying 16 issues.
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study
An empirical study reveals AI-generated code requires less frequent maintenance, primarily for feature extensions by humans, unlike human code needing bug fixes.
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation
LLM agents struggle significantly with structural constraints in backend code generation, showing "constraint decay" as requirements accumulate.
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work
Execution lineage introduces a DAG-based model for AI-native workflows, ensuring reproducible and maintainable work by explicitly managing dependencies and state.
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions
LLMs frequently specify vulnerable and incompatible third-party library versions, a systemic issue that external constraints can mitigate.
SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models
SiblingRepair uses LLMs for multi-hunk program repair, outperforming SOTA by improving sibling detection and generating consistent patches across related code.
Teaching LLMs Program Semantics via Symbolic Execution Traces
This paper improves LLM program semantics by training them on symbolic execution traces, boosting bug detection by over 17%.
Modeling Dependency-Propagated Ecosystem Impact of Changes in Maintenance Activities: Evaluating Support Strategies in the PyPI Network
A new model quantifies dependency-propagated ecosystem impact in PyPI to prioritize support, finding 0.1% of packages cause 80% of total impact.
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges
LLM safety judges are unreliable; their verdicts depend on policy wording, not just agent behavior, leading to flawed safety evaluations.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.