Rui Wang

11 papers · Latest: May 11, 2026

BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation

BabelDOC is an IR-based framework that accurately translates PDFs while preserving their original visual layout and improving terminology consistency.

2605.10845May 11, 2026

Robotics

When to Trust Imagination: Adaptive Action Execution for World Action Models

This paper introduces an adaptive execution method for World Action Models (WAMs) that verifies future predictions against reality, improving robotic manipulation efficiency and robustness.

2605.06222May 7, 2026

Cryptography & Security

ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

ReTokSync is a self-synchronizing framework that resolves tokenization ambiguity in generative linguistic steganography, achieving high extraction accuracy with minimal overhead.

2604.25486Apr 28, 2026

Cryptography & Security

DETOUR: A Practical Backdoor Attack against Object Detection

DETOUR introduces a practical backdoor attack on object detection models using semantic, viewpoint-invariant triggers effective across diverse real-world conditions.

2604.24599Apr 27, 2026

Cryptography & Security

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

This survey reviews split learning for LLM fine-tuning, detailing model, system, and privacy optimizations for secure, collaborative adaptation.

2604.24468Apr 27, 2026

Natural Language Processing

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

TingIS is an enterprise-scale system using LLMs and noise reduction to discover real-time risk events from noisy customer incidents with high accuracy.

2604.21889Apr 23, 2026

Quantitative Methods (Biology)

VARIANT: Web Server for Decoding and Analyzing Viral Mutations at Genome and Protein Levels

VARIANT is a web server for comprehensive analysis of viral mutations, including novel patterns and RNA secondary structures, across diverse viral genomes.

2604.20942Apr 22, 2026

Software Engineering

Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

This paper introduces a proactive framework using multimodal LLMs to detect GUI display defects in multi-window mobile scenarios, outperforming existing methods.

2604.19081Apr 21, 2026

Computer Vision

Seedance 2.0: Advancing Video Generation for World Complexity

Seedance 2.0 is a new multi-modal audio-video generation model with a unified architecture, offering advanced capabilities and improved performance.

2604.14148Apr 15, 2026

Computer Vision

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

AVGen-Bench introduces a new benchmark and multi-granular evaluation for Text-to-Audio-Video generation, revealing gaps in semantic reliability.

2604.08540Apr 9, 2026

Artificial Intelligence

The Llama 3 Herd of Models

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

2407.21783Jul 31, 2024

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.