From graphemic dependence to lexical structure: a Markovian perspective on Dante's Commedia
TLDR
This paper uses Markov chains on vowel-consonant encoding to analyze the structural organization and lexical patterns in Dante's Commedia.
Key contributions
- Models Dante's Commedia using vowel-consonant encoding and a four-state Markov chain.
- Reveals a consistent increase in graphemic memory from Inferno to Paradiso, indicating structural shifts.
- Identifies trigram-level graphemic probes linking local dependencies to specific lexical environments.
- Shows orthographic conventions and cantica-specific terms shape the poem's large-scale organization.
Why it matters
This research demonstrates how simple probabilistic models on symbolic text can uncover complex interactions between local dependencies, lexical distribution, and large-scale textual organization. It provides an interpretable framework for understanding higher-level literary structures.
Original Abstract
This study investigates the structural organisation of Dante's Divina Commedia through a symbolic representation based on vowel-consonant (V/C) encoding. Modelling the resulting sequence as a four-state Markov chain yields a parsimonious index of graphemic memory, capturing the balance between persistence and alternation patterns. Across the poem, this index exhibits a slight but consistent increase from the Inferno to the Paradiso, indicating a directional shift in local dependency structure. Trigram-level analysis shows that this trend is driven by a restricted set of recurrent configurations, interpreted as graphemic probes linking the Markov representation to identifiable lexical environments in the text. These probes display distinct behaviours: configurations involving two transitions more frequently emerge across word boundaries, reflecting interactions between adjacent tokens, whereas configurations with fewer transitions are largely confined to intra-lexical structures. Part of the signal is further shaped by orthographic phenomena, particularly apostrophised forms, highlighting the role of writing conventions alongside phonological and lexical organisation. A complementary classification analysis identifies cantica-specific terms, providing lexical anchors through which graphemic probes can be related to the structure of the poem. This organisation is reflected not only in the separation of the three cantiche, but also in a continuous trajectory across the text. Overall, the results show that simple probabilistic models applied to symbolic text representations can uncover structured interactions between local dependencies, lexical distribution, orthographic encoding, and large-scale organisation, providing an interpretable framework for linking local symbolic dynamics to higher-level textual organisation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.