Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages
Kevin Guan, Happy Buzaaba, Christiane Fellbaum
TLDR
Biaffine LSTMs outperform transformers in low-resource dependency parsing, with transformers gaining advantage as data increases.
Key contributions
- Evaluated 4 dependency parsers, including transformers and LSTMs, across 10 diverse languages.
- Biaffine LSTM consistently outperforms transformer models in low-resource settings.
- Transformers gain advantage as training data increases, with a clear crossover point.
- Morphological complexity predicts transformers' relative disadvantage in low-resource regimes.
Why it matters
This paper clarifies which parsing architectures are best for low-resource languages. It shows that simpler models like Biaffine LSTMs are more effective until sufficient data is available for transformers. This guides syntactic tool development for under-resourced languages.
Original Abstract
Transformer-based models achieve state-of-the-art dependency parsing for high-resource languages, yet their advantage over simpler architectures in low-resource settings remains poorly understood. We evaluate four parsers -- the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT -- across ten typologically diverse languages, with a focus on low-resource African languages. We find that the Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases. The crossover falls within a resource range typical of treebanks for under-resourced languages. Morphological complexity (measured via MATTR) emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These results indicate that the Biaffine LSTM may be better suited for syntactic tool development in low-resource regimes until sufficient annotated data is available to leverage the representational capacity of pre-trained transformers.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.