Xiwu Chen
2 papers ยท Latest:
Computer Vision
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
HERMES++ unifies 3D scene understanding and future geometry prediction in a driving world model, outperforming specialist methods.
2604.28196
Computer VisionWhen Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
NUMINA improves numerical alignment in text-to-video diffusion models by guiding regeneration, boosting counting accuracy and CLIP alignment.
2604.08546
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.