Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop
Wu-Yuin Hwang, Nur Alif Ilyasa, Muhammad Irfan Luthfi, Yuniar Indrihapsari
TLDR
This paper introduces a Personalized Thinking Model (PTM) for AI education, building cognitive twins from learner journals with LLMs and HITL refinement.
Key contributions
- Presents a Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation for AI education.
- PTM constructs "cognitive twins" from learner journals using LLMs, embeddings, dimensionality reduction, and clustering.
- Evaluated PTM fidelity through automatic matching (F1 ~75%), user perception (Likert ~4.3), and semantic alignment.
- Results show PTM produces acceptable fidelity, reflects user thinking, and demonstrates semantic abstraction across layers.
Why it matters
This research presents a novel approach to understanding individual learning by creating "cognitive twins." The PTM offers an interpretable, multi-layered view of a learner's thinking, enhancing AI-supported education. Its robust evaluation demonstrates its potential for personalized learning systems.
Original Abstract
This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, behavioral patterns, cognitive routines, metacognitive tendencies, and self-system values. PTM is grounded in Marzano's New Taxonomy of Educational Objectives and tries to clone learner's thinking model and build cognitive twin. It was constructed using a pipeline that combines large language model inference (Gemini 2.5 Pro), sentence embeddings, dimensionality reduction, and consensus clustering. This paper evaluates PTM fidelity through three methods applied to 40 participants in a seven-week study. First, automatic evaluation using atomic information point matching yielded an overall F1 score of 74.57% before human-in-the-loop (HITL) refinement and 75.48% after refinement. Second, user evaluation using a Likert scale produced mean ratings of 4.26 and 4.30 on a five-point scale for pre and post-HITL conditions respectively. Third, semantic alignment verification showed that topic coherence increased from 0.436 at the behavioral layer to 0.626 at the core value layer, while lexical overlap with journal vocabulary decreased from 0.114 to 0.007 across those same layers. These results suggest that the PTM produces outputs with acceptable fidelity, was generally perceived by users as reflecting their thinking, and showed a pattern consistent with semantic abstraction across layers.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.