Department of Industrial Management - UCSC – IIT Workshop on Shared Task towards building a Sinhala/Tamil Large Language Model

UCSC – IIT Workshop on Shared Task towards building a Sinhala/Tamil Large Language Model

Team Govi-Nena emerged as the winners of the UCSC-IIT Workshop on the Shared Task for Building a Sinhala/Tamil Large Language Model, held in parallel with the ICTer 2024 conference. This event focused on advancing the development of Large Language Models (LLMs) for the low-resource languages Sinhala and Tamil. The workshop focused on two primary tasks: developing a tokenizer and word embedding model and creating a chatbot system with Q&A support using the Retrieval-Augmented Generation (RAG) model for Sinhala/Tamil. The goal was to promote collaboration and innovation within the NLP community, establish robust benchmarks for Sinhala and Tamil NLP, and drive forward the development of LLMs for these languages.

Over the course of two months, teams from academia and industry competed to create the most effective solutions. Team Govi-Nena, consisting of members from the Department of Industrial Management at the University of Kelaniya, the University of Ruhuna, and Western Sydney University, presented two solutions for the Q&A chatbot task. Their code-based and low-code solutions achieved impressive accuracy scores of 80% and 85% respectively (based on human evaluation).

The members of the team are as follows:

Prof. Anusha Indika – University of Ruhuna

Mr. Shamika Tissera (https://www.linkedin.com/in/shamika-tissera) - Department of Industrial Management (UoK)

Mr. Kishan Fernando (https://www.linkedin.com/in/kishan-fernando-675029269) - University of Ruhuna

Ms. Shyama Wilson (https://www.linkedin.com/in/r-shyama-wilson-0346aa8a) - Western Sydney University

Mr. Aruna Lorensuhewa - University of Ruhuna