ArXiv TLDR

Zhen Yang

6 papers ยท Latest:

Computer Vision

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

GLM-5V-Turbo is a new foundation model integrating multimodal perception natively for enhanced agent reasoning, planning, and tool use across diverse contexts.

2604.26752

Orbital angular momentum radiation and polarization of relativistic electrons in magnetic fields

Relativistic electrons in magnetic fields can have their orbital angular momentum polarized by synchrotron radiation, much faster than spin polarization.

2604.21856
Software Engineering

DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

DebugRepair enhances LLM-based automated program repair by using self-directed debugging to collect intermediate runtime evidence, significantly improving bug-fixing performance.

2604.19305
Cryptography & Security

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

SAGE introduces Signal-Amplified Guided Embeddings to overcome "Signal Submersion" in LLM-based vulnerability detection, achieving SOTA performance.

2604.19031
Artificial Intelligence

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

This paper investigates how LLMs and VLMs understand viewpoint rotation from text, finding they struggle to bind viewpoint to observation, but selective fine-tuning helps.

2604.15294
Natural Language Processing

Gemini: A Family of Highly Capable Multimodal Models

Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.

2312.11805

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.