Rui Wang
11 papers ยท Latest:
BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation
BabelDOC is an IR-based framework that accurately translates PDFs while preserving their original visual layout and improving terminology consistency.
When to Trust Imagination: Adaptive Action Execution for World Action Models
This paper introduces an adaptive execution method for World Action Models (WAMs) that verifies future predictions against reality, improving robotic manipulation efficiency and robustness.
ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography
ReTokSync is a self-synchronizing framework that resolves tokenization ambiguity in generative linguistic steganography, achieving high extraction accuracy with minimal overhead.
DETOUR: A Practical Backdoor Attack against Object Detection
DETOUR introduces a practical backdoor attack on object detection models using semantic, viewpoint-invariant triggers effective across diverse real-world conditions.
A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations
This survey reviews split learning for LLM fine-tuning, detailing model, system, and privacy optimizations for secure, collaborative adaptation.
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
TingIS is an enterprise-scale system using LLMs and noise reduction to discover real-time risk events from noisy customer incidents with high accuracy.
VARIANT: Web Server for Decoding and Analyzing Viral Mutations at Genome and Protein Levels
VARIANT is a web server for comprehensive analysis of viral mutations, including novel patterns and RNA secondary structures, across diverse viral genomes.
Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning
This paper introduces a proactive framework using multimodal LLMs to detect GUI display defects in multi-window mobile scenarios, outperforming existing methods.
Seedance 2.0: Advancing Video Generation for World Complexity
Seedance 2.0 is a new multi-modal audio-video generation model with a unified architecture, offering advanced capabilities and improved performance.
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
AVGen-Bench introduces a new benchmark and multi-granular evaluation for Text-to-Audio-Video generation, revealing gaps in semantic reliability.
The Llama 3 Herd of Models
Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.