ArXiv TLDR

From Threads to Trajectories: A Multi-LLM Pipeline for Community Knowledge Extraction from GitHub Issue Discussions

🐦 Tweet
2604.25880

Nazia Shehnaz Joynab, Soneya Binta Hossain

cs.SE

TLDR

This paper introduces a multi-LLM pipeline and dataset, SWE-MIMIC-Bench, to extract structured issue trajectories from GitHub discussions, aiding developers.

Key contributions

  • Introduces SWE-MIMIC-Bench, a dataset of structured issue trajectories from GitHub discussions.
  • Presents a multi-LLM pipeline that transforms fragmented GitHub discussions into coherent issue trajectories.
  • Employs five distinct LLM configurations for granular tasks like comment analysis and trajectory synthesis.
  • Achieves 91.7% success in extracting high-fidelity reasoning trajectories from 800 real-world GitHub issues.

Why it matters

Complex OSS issue resolution is difficult due to unstructured discussions. This system provides structured narratives, significantly reducing cognitive load for developers. It also creates valuable data for training future LLM agents to mimic expert problem-solving.

Original Abstract

Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated from raw GitHub discussions using an automated multi-LLM pipeline. Unlike simple summarization, this pipeline utilizes a group of closed-source LLMs to perform granular tasks: analyzing individual comments with awareness of externally-linked resources, classifying comment analyses into label-specific fields (e.g., root cause, solution plan, implementation progress), and synthesizing label-aware trajectories which capture a structured and coherent narrative of the entire discussion thread. Our pipeline uses five closed-source LLM configurations for distinct purposes: label classification, inline code block and external link summarization, comment analysis, label-specific field classification and trajectory synthesis. By generating concise and reliable trajectories from complex conversation threads, this system can assist developers and researchers of broader software engineering community to understand the experience-driven collaborative approach for issue diagnosis. Furthermore, the generated trajectories can be used to train modern LLM agents to think and act like an expert developer. We evaluated our system on 800 real-world GitHub issues drawn from the SWE-Bench-Pro, SWE-Bench-Multilingual and SWE-Bench-Verified dataset, achieving a 91.7% success rate in extracting 734 high-fidelity reasoning trajectories.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.