Latxa: The ChatGPT in Basque that Seeks to Reduce the Digital Divide
The advancement in language models like ChatGPT has transformed the way we interact with technology. However, languages with fewer speakers, such as Basque, have not received the same level of attention. To address this digital divide, the Basque Center for Language Technology (HiTZ) has developed "Latxa", a chatbot in Basque that promises to surpass the capabilities of GPT-3.5 and compete with the most advanced models.
The Origin of Latxa
Eneko Agirre, who has dedicated his career to language processing, leads this ambitious project. With a team of computer scientists, linguists and engineers, HiTZ has created Latxa to provide Basque with the technological tools that other majority languages already enjoy.
Development process
To develop Latxa, three fundamental components have been needed:
- Research Team: Natural language processing experts who can work with advanced algorithms.
- Data in Basque: A large amount of text in Basque to feed the model, thus improving its accuracy and fluency.
- Supercomputing: Access to powerful computational resources, such as the LEONARDO supercomputer in Italy, to process and train the model.
How Latxa Works
Latxa uses an algorithm similar to other language models. This algorithm learns to predict words and word combinations based on enormous amounts of text. Through this process, the system acquires knowledge about grammar, morphology and context, allowing it to generate coherent text in Basque.
Challenges and Solutions
One of the main challenges is the limited amount of data available in Basque compared to languages such as English or Spanish. This imbalance can lead to grammatical errors and lower performance in language processing. To mitigate this issue, HiTZ has focused on collecting and utilizing all available resources, as well as ensuring that the model receives ongoing training.
The Latxa project has had the support of the Basque Government and European funds, which has been crucial to advance the development of the model. This funding has allowed HiTZ to overcome technical barriers and move towards creating a globally competitive model.
It not only seeks to be a functional tool, but also a symbol of the importance of preserving and promoting minority languages in the digital age. Eneko Agirre highlights that technology plays a crucial role in cultural and linguistic preservation, comparing it to the press, radio and television in its ability to influence and keep a language alive.
Latxa is a significant step towards the democratization of language technology, offering Basque a competitive and effective platform. This project not only improves the accessibility and use of Basque in technology, but also sets a precedent for how less spoken languages can thrive in the digital realm.