Xiaoxiao Ma
2 papers ยท Latest:
Computer Vision
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
OmniNFT proposes a novel diffusion RL framework to improve joint audio-video generation by addressing multi-modal challenges like gradient imbalance.
2605.12480
Computer VisionSCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
SCOPE is a framework that uses structured decomposition and conditional skill orchestration to maintain semantic commitments for complex text-to-image generation.
2605.08043
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.