ArXiv TLDR

UX in the Age of AI: Rethinking Evaluation Metrics Through a Statistical Lens

🐦 Tweet
2605.05600

Harish Vijayakumar

cs.HC

TLDR

This paper introduces ADUX-Stat, a novel statistical framework to evaluate user experience in AI systems, addressing limitations of traditional metrics.

Key contributions

  • Introduces ADUX-Stat, a new statistical framework for evaluating UX in AI-mediated systems.
  • Defines Interaction Entropy Index (IEI) to quantify AI response unpredictability from a user perspective.
  • Proposes Temporal Drift Coefficient (TDC) for measuring longitudinal degradation or improvement of perceived usability.
  • Presents Bayesian Usability Confidence Score (BUCS) for credible interval estimates of usability quality.

Why it matters

Traditional UX metrics fail for stochastic AI systems, leading to inaccurate evaluations. This paper provides ADUX-Stat, a robust, field-deployable statistical framework to accurately evaluate AI product UX. It fills a crucial gap for practitioners and researchers in HCI and AI.

Original Abstract

The rapid proliferation of artificial intelligence (AI) in consumer-facing digital products has disrupted the assumptions underlying classical user experience (UX) evaluation frameworks. Legacy metrics such as the System Usability Scale (SUS), Net Promoter Score (NPS), and task completion rate were engineered for deterministic, rule-based interfaces where identical inputs yield identical outputs. In AI-mediated systems -- spanning conversational agents, generative interfaces, and recommendation engines -- outputs are stochastic, context-sensitive, and temporally variable, rendering these metrics structurally insufficient. This paper introduces the Adaptive Dynamic UX Statistical Framework (ADUX-Stat), a novel evaluation model that reconceptualises usability as a probabilistic signal distribution rather than a static scalar score. ADUX-Stat integrates three original constructs: (1) Interaction Entropy Index (IEI), quantifying the unpredictability of AI responses from a user perception standpoint; (2) Temporal Drift Coefficient (TDC), measuring longitudinal degradation or improvement of perceived usability over interaction sessions; and (3) Bayesian Usability Confidence Score (BUCS), producing credible interval estimates of usability quality under uncertainty. The framework is validated conceptually against five established AI product categories. ADUX-Stat addresses a critical gap at the intersection of HCI research, statistical modelling, and AI product evaluation, offering a reproducible, field-deployable methodology for UX practitioners and researchers alike.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.