Research

Leading Open-Source AI Research for African Languages

Ongoing Research At The Lab

How much data is needed to train a good model for a LRL?

The biggest challenge with building a machine translation model for any low resourced language is data availability. The most asked question by anyone looking to build or research in this space is how much data is required to build such a model. In this paper, we will be evaluating and putting a number to how much data is needed to train a good model for a low resourced language.

Optimal model for LRL?

Since the breakthrough of the state-of-the-art architecture transformers, there has been an outburst of pre-trained models with different encoder-decoder compositions, data sizes, embedding layers, and many other configurations. These outbursts of models have made it hard to identify optimal models for different use cases. In this paper, we will be analyzing different transformer models and their performance across different natural language processing tasks for low resourced languages.

Evaluation metric

Most of the Natural Language Processing (NLP) researchers and practitioners overwhelmingly concentrate on algorithms and models while undervaluing the impact of data quality. Although some efforts have been devoted to data quality improvement, assessments, and understanding data quality problems, the data quality is rarely carefully and structurally evaluated in the current NLP systems. In this paper, we design rigorous and systematic data evaluation metrics that encompass multiple dimensions pertaining to the data quality, such as length of dataset, accuracy of translation, coverage of topics, etc.

Our Research Approach

The All Lab conducts cutting-edge AI research in collaboration with MARS Lab at UCLA to build roadmaps of strategies, methodologies, and centralized resources for developing, democratizing, and innovating AI technologies that facilitate effective communication and localization for African languages. The All Lab aims to be the leader in Open-Sourced AI research, architectural design, data collection, and language comprehension for African languages. We conduct research in collaboration with language experts, academics, linguists, native speakers, researchers, librarians, and thinkers.

The All Lab builds roadmaps of strategies, methodologies, and centralized resources for building, democratizing, and innovating AI technologies that facilitate effective communication in, and localizing for, African languages.

Contact Us

We'd love to hear from you! Whether you have questions, feedback, or want to explore partnership opportunities, our team is here to assist.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.