Jie Huang

4 papers · Latest: May 12, 2026

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

OmniNFT proposes a novel diffusion RL framework to improve joint audio-video generation by addressing multi-modal challenges like gradient imbalance.

2605.12480May 12, 2026

Computer Vision

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

SCOPE is a framework that uses structured decomposition and conditional skill orchestration to maintain semantic commitments for complex text-to-image generation.

2605.08043May 8, 2026

Computer Vision

IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flow

IR-Flow unifies discriminative and generative image restoration using Rectified Flow for efficient, high-quality results across various degradations.

2604.19680Apr 21, 2026

Statistical Machine Learning

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

This paper sharply describes local minima in two-layer ReLU networks' loss landscape, linking them to SGD dynamics and revealing a hierarchical structure.

2604.09412Apr 10, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.