Lei Zhang

8 papers · Latest: May 13, 2026

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models

GTA-VLA is an interactive Vision-Language-Action framework that uses user-provided spatial guidance to improve robot reasoning and robustness in embodied tasks.

2605.13632May 13, 2026

Validity and Limits of Low Order Hybridization Expansion Approaches for Multi-Orbital Systems

Low-order hybridization expansion methods' accuracy in multi-orbital systems is limited by the least correlated orbital, which suppresses features.

2605.02228May 4, 2026

Machine Learning

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

Unbalanced Schrödinger Bridge (USB) reconstructs discrete branching cell dynamics from snapshots, integrating stochastic and birth-death events.

2605.00545May 1, 2026

Robotics

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

ALAS uses dual-stream disentanglement for long-horizon human-scene interaction tasks, improving success and efficiency across domains.

2604.20721Apr 22, 2026

Computer Vision

CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers

CreatiParser is a new generative model that parses raster graphic designs into editable text, background, and sticker layers for easy editing.

2604.19632Apr 21, 2026

Information Retrieval

MasterSet: A Large-Scale Benchmark for Must-Cite Citation Recommendation in the AI/ML Literature

MasterSet is a new large-scale benchmark for identifying critical 'must-cite' papers in AI/ML literature, addressing a gap in existing citation recommendation systems.

2604.17680Apr 20, 2026

Cryptography & Security

Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark

This paper introduces BGTD, a new benchmark, and mmTraffic, an LLM-based multimodal framework for explainable encrypted network traffic interpretation.

2604.08140Apr 9, 2026

Artificial Intelligence

The Llama 3 Herd of Models

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

2407.21783Jul 31, 2024

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.