ArXiv TLDR

Yang Liu

11 papers ยท Latest:

Software Engineering

MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals

MASPrism uses SLM prefill-stage signals for lightweight, fast, and accurate failure attribution in multi-agent systems, outperforming larger LLMs.

2605.07509
Cryptography & Security

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

EvoPoC automates DeFi smart contract exploit synthesis using hierarchical knowledge graphs and multi-hop reasoning, achieving high detection and exploit success rates.

2605.02868
Software Engineering

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

PuzzleMark introduces a robust, stealthy watermarking method for code datasets, ensuring intellectual property protection with high verification success.

2604.27677
Software Engineering

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

RealBench is a new benchmark for repo-level code generation, using structured designs (UML) to better align LLM evaluation with real-world software development.

2604.22659
Computer Vision

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

UDM-GRPO integrates Uniform Discrete Diffusion Models with RL using novel insights for stable and efficient policy optimization, achieving SOTA results.

2604.18518
Software Engineering

Weaponizing the Commons: A Taxonomy and Detection Framework of Abuse on GitHub

This paper introduces a taxonomy and a high-performance detection framework for various abuse behaviors on GitHub, enhancing software supply chain security.

2604.17909
Cryptography & Security

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

SafeHarness introduces a lifecycle-integrated security architecture for LLM agents, significantly reducing attack success and unsafe behaviors.

2604.13630
Software Engineering

Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

CoT robustness in LLM4Code is not uniform; its benefits depend on model, task, prompt, and how perturbations affect structural anchors.

2604.12214
Mesoscale & Nanoscale Physics

Giant Domain-Wall Hall Magnetoresistance in Magnetic Topological Semimetal

Researchers discovered giant domain-wall Hall magnetoresistance in the magnetic topological semimetal Co3Sn2S2, linked to Weyl-enhanced anomalous Hall effect.

2604.11452
Software Engineering

GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair

GALA uses multimodal graph alignment to precisely localize bugs reported with GUI screenshots for automated program repair, outperforming text-only methods.

2604.08089
Machine Learning

Graph Neural ODE Digital Twins for Control-Oriented Reactor Thermal-Hydraulic Forecasting Under Partial Observability

A GNN-ODE digital twin forecasts reactor thermal-hydraulic states, even with missing sensors, enabling real-time control with high accuracy and speed.

2604.07292

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.