Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

April 30, 20262604.28061

Lauren Cadwallader, Iain Hrynaszkiewicz, parth sarin, Tim Vines

cs.DLcs.CL

TLDR

This paper introduces an LLM-based indicator to measure research data reuse in scholarly publications, finding a 43% reuse rate.

Key contributions

Introduces a novel LLM-based indicator developed by PLOS and DataSeer for measuring research data reuse.
Reports a 43% data reuse rate, which is higher than traditional bibliometric methods.
Demonstrates the scalability of generative AI for large-scale data reuse measurement.

Why it matters

Understanding the downstream effects of open science, like data reuse, is crucial. This paper provides a scalable, AI-driven method to quantify data reuse, suggesting its positive impacts may be underestimated.

Original Abstract

Numerous metascience studies and other initiatives have begun to monitor the prevalence of open science practices when it is more important to understand the 'downstream' effects or impacts of open science. PLOS and DataSeer have developed a new LLM-based indicator to measure an important effect of open science: the reuse of research data. Our results show a data reuse rate of 43%, which is higher than established bibliometric techniques. We show that data reuse can be measured at scale using LLMs and generative artificial intelligence. The positive effects of research data sharing and reuse may currently be underestimated.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers