All Lab Expands Global Language Coverage on Common Crawl: 62 New Languages Added

Ever wondered how the web’s hidden corners, especially pages in minority languages, regional dialect sites, and local community blogs, get discovered by large web crawlers and AI systems? The answer is community effort.

Today, All Lab is proud to announce a major contribution to digital language inclusion. We added 1,083 new URLs covering 71 languages, resulting in 62 brand new languages being added to Common Crawl’s dataset.

Why This Win Matters

Common Crawl is a nonprofit organization that maintains a free and open archive of web crawl data. The archive contains petabytes of information gathered over more than 15 years.

The “web languages” repository on GitHub is a community driven project that collects URLs for under represented and low resource languages so that Common Crawl can index them.

By contributing new links in lesser known languages such as community blogs, cultural sites, and regional portals, volunteers help ensure that the global web reflects true linguistic diversity rather than being dominated by major languages.

Expanding Digital Access for Under Represented Languages

All Lab boosts Common Crawl’s language coverage with new URLs that strengthen global linguistic inclusion.

Before our involvement, the repository contained 4,452 URLs across 131 languages.
After our contribution, the repository now contains 5,535 URLs across 193 languages.

This means:
1,083 new URLs added
62 newly represented languages

Our submissions included cultural websites, local news pages, regional community content, and language specific portals that were previously missing from the dataset.

BLOGS & ARTICLES

Stay Up To Date With All Lab

All Lab was celebrated at LocWorld’s Process Innovation Challenge for advancing African language AI through real-world, enterprise-ready solutions.

Sheriff Issaka received the ACLA Rising Star Award, recognizing All Lab’s breakthrough in language models built for Africa’s linguistic diversity.

Sheriff Issaka and team member delivered an AI & African Languages lecture at KNUST, advancing research, education and industry collaboration.

African Languages Lab