Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty
TLDR
Introduces an end-to-end framework for Probabilistic PLS using exact Stiefel optimization, offering calibrated uncertainty and improved accuracy.
Key contributions
- Develops an end-to-end PPLS framework combining noise pre-estimation, constrained optimization, and prediction calibration.
- Replaces full-spectrum noise averaging with noise-subspace estimation, achieving minimax optimal rates.
- Utilizes exact Stiefel-manifold optimization for orthogonality constraints, improving stability and accuracy.
- Provides closed-form standard errors and extends to sub-Gaussian settings via optional Gaussianization.
Why it matters
This paper introduces an end-to-end framework for Probabilistic PLS, resolving bottlenecks like noise-signal coupling and orthogonality. It improves uncertainty calibration and prediction accuracy in high-noise multi-omics data, offering interpretable latent factors with native calibrated uncertainty.
Original Abstract
Probabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two practical bottlenecks: noise--signal coupling under joint EM/ECM updates and nontrivial handling of orthogonality constraints. Following the fixed-noise scalar-likelihood line of Hu et al.\ (2025), we develop an end-to-end framework that combines noise pre-estimation, constrained likelihood optimization, and prediction calibration in one pipeline. Relative to Hu et al.\ (2025), we replace full-spectrum noise averaging with noise-subspace estimation and replace interior-point penalty handling with exact Stiefel-manifold optimization. The noise-subspace estimator attains a signal-strength-independent leading finite-sample rate and matches a minimax lower bound, while the full-spectrum estimator is shown to be inconsistent under the same model. We further extend the framework to sub-Gaussian settings via optional Gaussianization and provide closed-form standard errors through a block-structured Fisher analysis. Across synthetic high-noise settings and two multi-omics benchmarks (TCGA-BRCA and PBMC CITE-seq), the method achieves near-nominal coverage without post-hoc recalibration, reaches Ridge-level point accuracy on TCGA-BRCA at rank $r=3$, matches or exceeds PO2PLS on cross-view prediction while providing native calibrated uncertainty, and improves stability of parameter recovery.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.