This paper presents a novel approach to Kurdish machine translation using tokenization-free methods with the ByT5 model. The research addresses the unique challenges of Kurdish language processing by implementing byte-level encoding that eliminates the need for traditional tokenization approaches. The study demonstrates improved translation quality and better handling of Kurdish morphological complexity through the use of ByT5 architecture.
Bnar, Polla Fattah. ‘Tokenisation-Free Machine Translation for Central Kurdish using the ByT5 Model.’ International Conference on Natural Language Processing, 2024.