Research

The All Lab conducts cutting-edge AI research in its effort to build roadmaps of strategies, methodologies, and centralized resources for building, democratizing, and innovating AI technologies that facilitate effective communication in, and localizing for, African languages.

The All Lab seeks to be the leader in Open-Sourced AI Research, architectural design, data collection, and language comprehension for African languages.

We conduct research in collaboration with language experts, academics, linguists, native speakers, researchers, librarians, and thinkers.

African Languages Landscape

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in modern technologies such as AI.

This number should not be too surprising given the sheer size of the African continent. Africa is the second largest continent, only behind Asia, and accounts for over 20% of earth's land mass.

Ongoing Research

status:

1. How much data is needed to train a good model for a LRL?

The biggest challenge with building a machine translation model for any low resourced language is data availability. The most asked question by anyone looking to build or research in this space is how much data is required to build such a model.

In this paper, we will be evaluating and putting a number to how much data is needed to train a good model for a low resourced language.

status:

2. Optimal model for LRL?

Since the breakthrough of the state-of-the-art architecture transformers, there has been an outburst of pre-trained models with different encoder-decoder compositions, data sizes, embedding layers, and many other configurations.

These outbursts of models have made it hard to identify optimal models for different use cases. In this paper, we will be analyzing different transformer models and their performance across different natural language processing tasks for low resourced languages.

status:

3. Evaluation metric

Most of the Natural Language Processing (NLP) researchers and practitioners overwhelmingly concentrate on algorithms and models while undervaluing the impact of data quality. Although some efforts have been devoted to data quality improvement, assessments, and understanding data quality problems, the data quality is rarely carefully and structurally evaluated in the current NLP systems. 

In this paper, we design rigorous and systematic data evaluation metrics that encompass multiple dimensions pertaining to the data quality, such as length of dataset, accuracy of translation, coverage of topics, etc. 

Interested in joining the lab?

The ensure the integrity and quality of the work at the All Lab, joining the lab is by invitation.

On limited occasions, we evaluate highly skilled individuals. Feel free to send as an email if you seem like the type!

All Lab Goals

Research

Collaborating with African language experts, academics, linguists, native speakers, researchers, librarians, and thinkers, the All Lab seeks to be the leader in Open-Sourced AI architectural design, data collection, and language comprehension for African languages.

Information Hub

The All Lab Works to make Accurate Information About African Languages Readily Available and Accessible In order to make it easy for All Person Seeking to Work, Localize, or Research In this space.