Hao Wang
11 papers ยท Latest:
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning
StepCodeReasoner uses RL to align code reasoning with stepwise execution traces, achieving SOTA performance by supervising intermediate states.
BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation
BabelDOC is an IR-based framework that accurately translates PDFs while preserving their original visual layout and improving terminology consistency.
VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models
VEGA enhances VLA models' spatial reasoning by directly aligning their visual encoder outputs with 3D-aware features, improving robotic manipulation.
Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
This paper shows DNNs in feature interaction models mitigate dimensional collapse, improving representation robustness and clarifying their role.
SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning
SOLAR-RL bridges offline and online RL for MLLM GUI agents, using simulated online feedback to boost long-horizon task completion and robustness.
AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection
AIFIND introduces artifact-aware semantic anchors and attention to stabilize incremental face forgery detection, preventing feature drift and catastrophic forgetting.
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking
AdaRankLLM rethinks adaptive RAG, proposing a framework that optimizes retrieval for both weak and strong LLMs, significantly reducing context overhead.
Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models
K-Token Merging compresses LLM inputs in the latent embedding space, reducing sequence length by up to 75% with minimal performance loss.
Stochastic Trust-Region Methods for Over-parameterized Models
This paper introduces a stochastic trust-region framework for over-parameterized models, eliminating manual step-size tuning and handling constrained problems.
Visual Preference Optimization with Rubric Rewards
rDPO introduces rubric-based preference optimization for visual tasks, using instance-specific checklists to generate high-quality feedback.
XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios
XRZero-G0 is a hardware-software system that enables scalable, high-quality robot-free data collection for dexterous manipulation, reducing costs significantly.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.