Preserving endangered languages using AI and data

Published on June 23, 2023

Imagine you’re trying to preserve a beautiful butterfly. You take pictures of the butterfly and store them as data, thinking it will capture the essence of its vibrant colors and delicate wings. But here’s the catch: what if the pictures only capture the butterfly when it’s sitting still on a leaf, missing out on its graceful flight and intricate patterns in motion? This is similar to the challenge of preserving endangered languages using only data. While data collection combined with AI language models can help document languages, there’s a concern that they might not fully capture the dynamic nature of language usage. Without this dynamic understanding, how can we ensure that we’re preserving all the essential knowledge about how a language functions? This issue is vital to address to prevent the loss of valuable linguistic diversity. Exploring the research in this area can shed light on innovative methods to safeguard endangered languages for future generations.

Abstract
Many of our spoken languages are endangered and rapidly becoming extinct. Due to this, there are attempts to preserve as many of those languages as possible. One preservation approach is combining data collection and artificial intelligence-based language models. However, current data collection methods may only capture static data from a dynamic cognitive process. If data are not genuinely capturing the dynamic process, it raises questions about whether they capture all the essential knowledge about how a language functions. Here, we discuss the implications of this issue and its importance in preserving endangered languages.

Read Full Article (External Site)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>