ArXiv TLDR

Dual-Control Frequency-Aware Diffusion Model for Depth-Dependent Optical Microrobot Microscopy Image Generation

🐦 Tweet
2604.11680

Lan Wei, Zongcai Tan, Kangyi Lu, Jian-Qing Zheng, Dandan Zhang

cs.RO

TLDR

Du-FreqNet is a diffusion model that generates realistic, depth-dependent microscopy images for microrobots, improving 3D perception.

Key contributions

  • Proposes Du-FreqNet, a dual-control, frequency-aware diffusion model for microrobot microscopy images.
  • Uses two ControlNet branches to encode 3D point clouds and depth-specific mesh layers.
  • Introduces an adaptive frequency-domain loss for physically consistent depth-dependent effects.
  • Achieves 20.7% SSIM improvement and enhances 3D pose/depth estimation for microrobots.

Why it matters

This paper tackles the scarcity of microscopy data for microrobot 3D perception. Du-FreqNet generates physically consistent, depth-dependent images, significantly improving 3D pose and depth estimation. This enables more robust autonomous microrobotic systems and better closed-loop control.

Original Abstract

Optical microrobots actuated by optical tweezers (OT) are important for cell manipulation and microscale assembly, but their autonomous operation depends on accurate 3D perception. Developing such perception systems is challenging because large-scale, high-quality microscopy datasets are scarce, owing to complex fabrication processes and labor-intensive annotation. Although generative AI offers a promising route for data augmentation, existing generative adversarial network (GAN)-based methods struggle to reproduce key optical characteristics, particularly depth-dependent diffraction and defocus effects. To address this limitation, we propose Du-FreqNet, a dual-control, frequency-aware diffusion model for physically consistent microscopy image synthesis. The framework features two independent ControlNet branches to encode microrobot 3D point clouds and depth-specific mesh layers, respectively. We introduce an adaptive frequency-domain loss that dynamically reweights high- and low-frequency components based on the distance to the focal plane. By leveraging differentiable FFT-based supervision, Du-FreqNet captures physically meaningful frequency distributions often missed by pixel-space methods. Trained on a limited dataset (e.g., 80 images per pose), our model achieves controllable, depth-dependent image synthesis, improving SSIM by 20.7% over baselines. Extensive experiments demonstrate that Du-FreqNet generalizes effectively to unseen poses and significantly enhances downstream tasks, including 3D pose and depth estimation, thereby facilitating robust closed-loop control in microrobotic systems.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.