All Voices

The biggest challenge researching and building robust state-of-the-art NLP engines for African languages is the unavailability of quality data.

All Voices is our answer to this challenge. All Voices is a crowdsourcing mobile application that allows for the collection and evaluation of data on low-resourced African languages.

All Voices currently has support for over 25 low-resourced languages.

African Languages Landscape

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in modern technologies such as AI.

This number should not be too surprising given the sheer size of the African continent. Africa is the second largest continent, only behind Asia, and accounts for over 20% of earth's land mass.

Supported Languages

status:

1. Bemba

Bemba, also called Babemba or Awemba, are Bantu-speaking people inhabiting the northeastern plateau of Zambia and neighboring areas of Congo (Kinshasa) and Zimbabwe. Bemba is one of the spoken languages in Zambia, spoken by many people who live in urban areas, and is one of Zambia's seven recognized regional languages. The Bemba people in Zambia originated from the Kola region in the Democratic Republic of Congo (DRC, formerly Zaire), and are an offshoot of the ancient Luba empire. In contemporary Zambia, the word “Bemba” actually has several meanings. It may designate people of Bemba origin, regardless of where they live, e.g., whether they live in urban areas or in the original rural Bemba area. Alternatively, it may encompass a much larger population which includes some eighteen different ethnic groups, who together with the Bemba, form a closely related ethnolinguistic cluster of matrilineal-matrifocal agriculturalists known as the Bemba-speaking peoples of ZambiaBemba speakers are estimated to be about 5 – 6 million. An estimated 3.7 million people speak Bemba and related dialects as a first language; other speakers speak Bemba as a second language.

status:

2. Berber

The Berber languages are a group of 26 closely related languages that constitute a branch of the Afro-Asiatic language family. They are spoken by 14 to 25 million people in Northern Africa throughout the Mediterranean coast, the Sahara desert, and the Sahel, an area that used to be dominated by Berbers before the arrival of the Arabs. Today, there are large groups of Berber-speaking people in Morocco and Algeria, Mali, Niger, and Libya, and smaller groups in Tunis, Mauritania, Burkina-Faso, and Egypt. Speakers of the various Berber languages make up around 50% of the population in Morocco and about 25% in Algeria. The Tuareg of the desert also belong to the Berber group.

status:

3. Chewa

Chewa is a Bantu language spoken in much of Southern, Southeast, and East Africa, namely the countries of Malawi and Zimbabwe, where it is an official language, and Mozambique and Zambia, where it is a recognized minority language. In Malawi, the name was officially changed from Chinyanja to Chichewa in 1968 at the insistence of President Hastings Kamuzu Banda (himself of the Chewa people), and this is still the name most commonly used in Malawi today. In Zambia, the language is generally known as Nyanja or Cinyanja/Chinyanja '(language) of the lake' (referring to Lake Malawi). Chewa belongs to the same language group (Guthrie Zone N) as Tumbuka, Sena, and Nseng.

status:

4. Comorian

Comorian is the name given to a group of four Bantu languages spoken in the Comoro Islands, an archipelago in the southwestern Indian Ocean between Mozambique and Madagascar. It is named as one of the official languages of the Union of the Comoros in the Comorian constitution. Shimaore, one of the languages, is spoken on the disputed island of Mayotte, a French department claimed by Comoros.

status:

5. Dinka

Dinka is a member of the Western Nilotic branch of Nilo-Saharan languages. It is spoken mainly in southern Sudan by about 2-3 million people, who call themselves Dinka (Jiëŋ). There are five major dialects: Ngok, Rek, Agaar, Twic/Twi East and Bork, which are more or less mutually intelligible. The Rek dialect is considered the standard or prestige variety.

status:

6. Fang

Fang is a Bantu language spoken in Equatorial Guinea, Cameroon, Gabon, the Republic of the Congo, and São Tomé and Príncipe. In 2013 there were about 1 million speakers of Fang, including 589,000 in Equatorial Guinea, 121,000 in Cameroon, 8,100 in Congo, and 350,00 in Gabon. In Equatorial Guinea Fang is spoken in Centro Sur, Kié-Ntem, Litoral, and Wele-Nzas provinces. In Cameroon it is spoken in the South region. In Congo it is spoken in the Sangha department, and in Gabon it is spoken in the northwest, mainly in Woleu-Ntem and Estuaire provinces, and also in Ogooué-Ivindo and Ngounié provinces.Fang is also known as Pahouin, Pamue or Pangwe. There are many Fang dialects, some of which are considered separate languages. Fang is closely related to Bulu and Ewondo, and is one of the Beti languages.

status:

7. Fon

Fon is a member of the Eastern Gbe branch of the Niger-Congo language family. It is spoken mainly in the Atlantique, Littoral and Zou departments in southern Benin, and also in the Plateaux region of southern Togo. In 2016 there were about 1.9 million speakers of Fon in Benin. There were about 35,500 speakers of Fon in Togo in 1991.

status:

8. Kikongo

Kikongo is one of the Bantu languages spoken by the Kongo people living in the Democratic Republic of the Congo, the Republic of the Congo, Gabon and Angola. It is a tonal language. It was spoken by many of those who were taken from the region and sold as slaves in the Americas. For this reason, while Kongo still is spoken in the above-mentioned countries, creolized forms of the language are found in ritual speech of Afro-American religions, especially in Brazil, Cuba, Puerto Rico, the Dominican Republic and Haiti. It is also one of the sources of the Gullah language[4] and the Palenquero creole in Colombia. The vast majority of present-day speakers live in Africa. There are roughly seven million native speakers of Kongo, with perhaps two million more who use it as a second language.

status:

9. Kiriol

Kiriol is a creole language whose lexicon derives mostly from Portuguese. It is spoken in Guinea Bissau, Senegal, and The Gambia. It is also called by its native speakers as guinensi, kriyol, or portuguis. Guinea-Bissau Creole is spoken as a native tongue by 250,000 Bissau-Guineans and as a second language by 1,000,000. A variant of Guinea-Bissau Creole is also spoken in southern Senegal, mainly in the region of Casamance, a former Portuguese colony, which is known as Portuguis Creole or Casamance Creole. Creole is the majority language of the inhabitants of the Casamance region and is used as a language of commerce.

status:

10. Liberian Kreyo

Liberian Kreyol is an Atlantic English-lexicon creole language spoken in Liberia.Also known as Liberian kolokwa English, was spoken by 1,500,000 people as a second language (1984 census) which is about 70% of the population in that time. Today the knowledge of some form of English is even more widespread. It is historically and linguistically related to Merico, a creole spoken in Liberia, but is grammatically distinct from it. Liberian Kreyol language developed from Liberian Interior Pidgin English, the Liberian version of West African vernacular English, though it has been significantly influenced by Liberian Settler English, itself based on American English, particularly African-American Vernacular English and Southern American English. Its phonology owes much to the indigenous Languages of Liberia. It has been analyzed as having a post-creole continuum. As such, rather than being a pidgin wholly distinct from English, it is a range of varieties that extend from the highly pidginized to one that shows many similarities to English as spoken elsewhere in West Africa.

status:

11. Makhuwa

Makhuwa is a Bantu language in the Niger-Congo language family. It is also sometimes referred to as Emakua, and Makoane. Makhuwa has many different dialects including Emwaja, Enaharra (Maharra, Nahara, Emathipane), Enyara, Central Makua (Makhuwana, Makuana, Emakhuwana), Rati Empamela (Nampamela), Enlai , Saka, Shirima, Marrevone, Makhuwan (Emakhuwana), Meetto, and Moniga. Makhuwa uses Latin alphabet. Oral literature and proverbs are important ways that the Makhuwa identify and pass on knowledge. Makhuwa is the primary Bantu language of northern Mozambique. It is spoken by 4 million Makua people, who live north of the Zambezi River, particularly in Nampula Province, which is virtually entirely ethnically Makua. It is the most widely spoken indigenous language of Mozambique.

status:

12. Mandinka

The Mandinka language is a Mande language spoken by the Mandinka people of Guinea, northern Guinea-Bissau, the Casamance region of Senegal, and the Gambia where it is one of the principal languages. Mandinka belongs to the Manding branch of Mande and is thus similar to Bambara and Maninka/Malinké but with only 5 instead of 7 vowels. In a majority of areas, it is a tonal language with two tones: low and high, although the particular variety spoken in the Gambia and Senegal borders on a pitch accent due to its proximity with non-tonal neighboring languages like Wolof.

status:

13. Mauritian Creole

Mauritian Creole, also called Morisyen, French-based vernacular language spoken in Mauritius, a small island in the southwestern Indian Ocean, about 500 miles (800 km) east of Madagascar. The language developed in the 18th century from contact between French colonizers and the people they enslaved, whose primary languages included Malagasy, Wolof, and a number of East African Bantu languages. The contributions of the masses of East Indian contract laborers brought into Mauritius during the second half of the 19th century appear to be limited to the lexicon (vocabulary). The structures of Mauritian Creole appear to have been fully in place by the time of Indian immigration.

status:

14. Mossi

Mossi is member of the Gur branch of Niger-Congo languages and is spoken in Burkina Faso by about 7.6 million people (in 2007), and by about 60,000 in Mali and Togo. It is one of the official regional languages in Burkina Faso and is closely related to Dagbani, with which is is mutually intelligible. Mossi is also known as Mòoré, Mooré, Moré, Moshi, Moore or More.

status:

15. Ngambay

Ngambay (also known as Sara, Sara Ngambai, Gamba, Gambaye, Gamblai and Ngambai) is one of the major languages spoken by Sara people in southwestern Chad, northeastern Cameroon, and eastern Nigeria, with about a million native speakers. Ngambay is the most widely spoken of the Sara languages and is used as a trade language between speakers of other dialects. It is spoken by the Sara Gambai people.

status:

16. Ovambo

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

17. Pulaar

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

18. Swazi

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

19. Tswana

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

20. Twi

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

21. English

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

22. Hindi

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

23. Khmer

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

24. Spanish

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

status:

25. Swahili

Underrepresented and unknown to many, African languages (over 2000) represent a third of all world languages. Despite their cultural significance, diversity, history, symbolic meaning, and wealth, little work has been done to preserve and solidify their presence in current modern technologies

All Lab Goals

Research

Collaborating with African language experts, academics, linguists, native speakers, researchers, librarians, and thinkers, the All Lab seeks to be the leader in Open-Sourced AI architectural design, data collection, and language comprehension for African languages.

Information Hub

The All Lab Works to make Accurate Information About African Languages Readily Available and Accessible In order to make it easy for All Person Seeking to Work, Localize, or Research In this space.