Hao Wu
10 papers ยท Latest:
Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements
This paper introduces UPAttack, demonstrating how usability requirements can force LLMs to generate insecure code, achieving up to 98.1% attack success.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
MemCompiler dynamically compiles state-conditioned memory for embodied agents, improving performance and efficiency over static memory injection.
RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models
RoboAlign-R1 improves robot video world models by using reward-aligned post-training and stabilized long-horizon inference, boosting task consistency and realism.
Observation of attractor transitions in active magnon-polaritons under microwatt drives
Active magnon-polaritons enable low-power observation of attractor transitions, explosive bistability, and chaotic dynamics for new microwave applications.
STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation
STARRY is a novel world model for robotic manipulation that aligns spatial-temporal prediction with action generation for improved task success.
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories
AblateCell is an AI agent that reproduces baselines and performs systematic ablations on virtual cell repositories to identify critical components.
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection
SAGE introduces Signal-Amplified Guided Embeddings to overcome "Signal Submersion" in LLM-based vulnerability detection, achieving SOTA performance.
C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts
C-ReD is a new Chinese benchmark for detecting AI-generated text, improving diversity and generalization over prior datasets.
Stop Wandering: Efficient Vision-Language Navigation via Metacognitive Reasoning
MetaNav improves Vision-Language Navigation efficiency and robustness using metacognitive reasoning, reducing redundant exploration and VLM queries.
Gemini: A Family of Highly Capable Multimodal Models
Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.