Authors: Shagun Sinha
Abstract: This thesis presents Abstractive Text Summarization models for contemporary
Sanskrit prose. The first chapter, titled Introduction, presents the motivation
behind this work, the research questions, and the conceptual framework.
Sanskrit is a low-resource inflectional language. The key research question
that this thesis investigates is what the challenges in developing an
abstractive TS for Sanskrit. To answer the key research questions,
sub-questions based on four different themes have been posed in this work. The
second chapter, Literature Review, surveys the previous works done. The third
chapter, data preparation, answers the remaining three questions from the third
theme. It reports the data collection and preprocessing challenges for both
language model and summarization model trainings. The fourth chapter reports
the training and inference of models and the results obtained therein. This
research has initiated a pipeline for Sanskrit abstractive text summarization
and has reported the challenges faced at every stage of the development. The
research questions based on every theme have been answered to answer the key
research question.
Source: http://arxiv.org/abs/2501.01933v1