ArXiv TLDR

David Lo

6 papers ยท Latest:

Software Engineering

Tail-aware N-version Machine Learning Models for Reliable API Recommendation

NvRec uses N-version ML models to improve the reliability of API recommendations, especially for infrequently used "tail" APIs, by filtering unreliable outputs.

2604.27647
Cryptography & Security

TitanCA: Lessons from Orchestrating LLM Agents to Discover 100+ CVEs

TitanCA orchestrates LLM agents to discover 203 zero-day vulnerabilities and 118 CVEs, significantly improving software security.

2604.17860
Software Engineering

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

LLMs can deobfuscate binary code, but performance relies on reasoning and task-specific fine-tuning, not just model size.

2604.08083
Software Engineering

Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios

This paper introduces CLI-Tool-Bench, a new benchmark for evaluating LLM-based 0-to-1 software generation, revealing current models struggle with end-to-end CLI tool creation.

2604.06742
Software Engineering

AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

AgentSZZ is an LLM agent framework that significantly improves bug-inducing commit identification, especially for complex cases like cross-file and ghost commits.

2604.02665
Software Engineering

TestDecision: Sequential Test Suite Generation via Greedy Optimization and Reinforcement Learning

TestDecision uses greedy optimization and RL to enable open-source LLMs to generate high-quality, sequential test suites, boosting coverage and bug detection.

2604.01799

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.