ArXiv TLDR

Fast and Forgettable: A Controlled Study of Novices' Performance, Learning, Workload, and Emotion in AI-Assisted and Human Pair Programming Paradigms

🐦 Tweet
2604.18538

Nicholas Gardella, James Prather, Juho Leinonen, Paul Denny, Raymond Pettit + 1 more

cs.HC

TLDR

Novices perform better with AI (Copilot) but learn less and have a less positive emotional experience compared to human pair programming.

Key contributions

  • Novices performed significantly better and experienced reduced workload with GitHub Copilot.
  • Human pair programming resulted in significantly more positive and arousing emotional experiences.
  • AI-assisted programming led to a larger retest performance decrement, suggesting less learning retention.
  • Recommends educators revisit human pair programming alongside embracing AI tools.

Why it matters

This study provides crucial empirical evidence on AI vs. human pair programming for novices. It highlights a trade-off: AI boosts immediate performance but human collaboration offers better emotional and long-term learning benefits. Educators should balance AI tools with traditional pair programming.

Original Abstract

Code-generating Artificial Intelligence has gained popularity within both professional and educational programming settings over the past several years. While research and pedagogy are beginning to cope with this change, computing students are left to bear the unforeseen consequences of AI amidst a dearth of empirical evidence about its effects. Though pair programming between students is well studied and known to be beneficial to self-efficacy and academic achievement, it remains underutilized and further threatened by the proposition that AI can replace a human programming partner. In this paper, we present a controlled pair programming study with 22 participants who wrote Python code under time pressure in teams of two and individually with GitHub Copilot for 20 minutes each. They were incentivized by bonus compensation to balance performance with understanding and were retested individually on the programming tasks after a retention interval of one week. Subjective measures of workload and emotion as well as objective measures of performance and learning (retest performance) were collected. Results showed that participants performed significantly better with GitHub Copilot than their human teammate, and several dimensions of their workload were significantly reduced. However, the emotional effect of the human teammate was significantly more positive and arousing as compared to working with Copilot. Furthermore, there was a nonsignificant absolute retest performance reduction in the AI condition and a larger retest performance decrement in the AI condition. We recommend that educators strongly consider revisiting pair programming as an educational tool in addition to embracing modern AI.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.