Authors: Abdulkader Saoud, Mahmut Alomeyr, Himmet Toprak Kesgin, Mehmet Fatih Amasyali
Abstract: This paper investigates the effectiveness of BERT based models for automated
punctuation and capitalization corrections in Turkish texts across five
distinct model sizes. The models are designated as Tiny, Mini, Small, Medium,
and Base. The design and capabilities of each model are tailored to address the
specific challenges of the Turkish language, with a focus on optimizing
performance while minimizing computational overhead. The study presents a
systematic comparison of the performance metrics precision, recall, and F1
score of each model, offering insights into their applicability in diverse
operational contexts. The results demonstrate a significant improvement in text
readability and accuracy as model size increases, with the Base model achieving
the highest correction precision. This research provides a comprehensive guide
for selecting the appropriate model size based on specific user needs and
computational resources, establishing a framework for deploying these models in
real-world applications to enhance the quality of written Turkish.
Source: http://arxiv.org/abs/2412.02698v1