ArXiv TLDR

Rui Wang

11 papers ยท Latest:

Computer Vision

BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation

BabelDOC is an IR-based framework that accurately translates PDFs while preserving their original visual layout and improving terminology consistency.

2605.10845
Robotics

When to Trust Imagination: Adaptive Action Execution for World Action Models

This paper introduces an adaptive execution method for World Action Models (WAMs) that verifies future predictions against reality, improving robotic manipulation efficiency and robustness.

2605.06222
Cryptography & Security

ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

ReTokSync is a self-synchronizing framework that resolves tokenization ambiguity in generative linguistic steganography, achieving high extraction accuracy with minimal overhead.

2604.25486
Cryptography & Security

DETOUR: A Practical Backdoor Attack against Object Detection

DETOUR introduces a practical backdoor attack on object detection models using semantic, viewpoint-invariant triggers effective across diverse real-world conditions.

2604.24599
Cryptography & Security

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

This survey reviews split learning for LLM fine-tuning, detailing model, system, and privacy optimizations for secure, collaborative adaptation.

2604.24468
Natural Language Processing

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

TingIS is an enterprise-scale system using LLMs and noise reduction to discover real-time risk events from noisy customer incidents with high accuracy.

2604.21889
Quantitative Methods (Biology)

VARIANT: Web Server for Decoding and Analyzing Viral Mutations at Genome and Protein Levels

VARIANT is a web server for comprehensive analysis of viral mutations, including novel patterns and RNA secondary structures, across diverse viral genomes.

2604.20942
Software Engineering

Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

This paper introduces a proactive framework using multimodal LLMs to detect GUI display defects in multi-window mobile scenarios, outperforming existing methods.

2604.19081
Computer Vision

Seedance 2.0: Advancing Video Generation for World Complexity

Seedance 2.0 is a new multi-modal audio-video generation model with a unified architecture, offering advanced capabilities and improved performance.

2604.14148
Computer Vision

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

AVGen-Bench introduces a new benchmark and multi-granular evaluation for Text-to-Audio-Video generation, revealing gaps in semantic reliability.

2604.08540
Artificial Intelligence

The Llama 3 Herd of Models

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

2407.21783

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.