Tokenisation-Free Machine Translation for Central Kurdish using the ByT5 Model

Bnar Ismail, Polla Fattah

January 15, 2024
Conference Paper
International Conference on Natural Language Processing
1 min read

Abstract

This paper presents a novel approach to Kurdish machine translation using tokenization-free methods with the ByT5 model. The research addresses the unique challenges of Kurdish language processing by implementing byte-level encoding that eliminates the need for traditional tokenization approaches. The study demonstrates improved translation quality and better handling of Kurdish morphological complexity through the use of ByT5 architecture.

Citation

Bnar, Polla Fattah. ‘Tokenisation-Free Machine Translation for Central Kurdish using the ByT5 Model.’ International Conference on Natural Language Processing, 2024.

Extra info

  • Type: Conference Paper