Google and a consortium of African research institutions have launched the WAXAL dataset, a major new effort to… The post Google to train AI in 21 African languagesGoogle and a consortium of African research institutions have launched the WAXAL dataset, a major new effort to… The post Google to train AI in 21 African languages

Google to train AI in 21 African languages, including Yoruba, Hausa and Igbo

Google and a consortium of African research institutions have launched the WAXAL dataset, a major new effort to correct one of artificial intelligence’s (AI) major challenges on the continent, its inability to interpret and understand most African languages.

The project delivers a large, open speech dataset spanning 21 Sub-Saharan African languages and brings voice technology to more than 100 million people excluded from the AI economy.

The WAXAL dataset is the product of a three-year collaboration funded by Google and led by local universities and community groups.

It includes 1,250 hours of transcribed, natural speech and more than 20 hours of studio-grade recordings aimed at building high-fidelity synthetic voices. It targets languages such as Hausa, Yoruba, Luganda, Igbo and Acholi, many of which are spoken by tens of millions but remain largely invisible to commercial speech systems.

Google and African universities launch the WAXAL dataset to train AI in 21 African languages, including Yoruba, Hausa and Igbo

For all the talk of global AI, voice technologies still lean heavily towards English and a narrow handful of European and Asian languages. Africa, home to over 2,000 languages, has been left on the margins.

That gap is not academic; it shapes who can use digital services, who can access education and healthcare tools, and who gets to build companies on top of modern AI platforms. Google framed the work as a step toward narrowing a long-standing data gap that has kept many African languages off voice assistants and other tools.

Why the WAXAL dataset matters for Africa’s AI architecture

Beyond addressing this imbalance directly, the project matters as much as the data itself.

Unlike earlier initiatives where African speech data was extracted and owned elsewhere, WAXAL was led on the ground by African institutions. Makerere University in Uganda, the University of Ghana, and Digital Umuganda in Rwanda oversaw data collection, community engagement, and language stewardship, with technical support from Google Research Africa.

Crucially, those institutions retain ownership of the data. That is a notable shift in a field often criticised for reproducing extractive dynamics under the banner of openness.

According to Aisha Walcott-Bryant, Head of Google Research Africa, “The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people.”

“We look forward to seeing African innovators use this data to create everything from new educational tools to voice-enabled services that create tangible economic opportunities across the continent”, she added. 

Google and African universities launch the WAXAL dataset to train AI in 21 African languages, including Yoruba, Hausa and IgboAisha Walcott-Bryant, Head of Google Research Africa

That framing is echoed by the universities involved. Joyce Nakatumba-Nabende, a senior lecturer at Makerere University, said:

“For AI to have a real impact in Africa, it must speak our languages and understand our contexts. The WAXAL dataset gives our researchers the high-quality data they need to build speech technologies that reflect our unique communities. In Uganda, it has already strengthened our local research capacity and supported new student- and faculty-led projects.”

At the University of Ghana, Associate Professor Isaac Wiafe pointed to the scale of public engagement: 

“For us at the University of Ghana, WAXAL’s impact goes beyond the data itself. It has empowered us to build our own language resources and train a new generation of AI researchers. Over 7,000 volunteers joined us because they wanted their voices and languages to belong in the digital future. Today, that collective effort has sparked an ecosystem of innovation in fields like health, education, and agriculture. This proves that when the data exists, possibility expands everywhere.”

There is reason for cautious optimism. Open speech datasets can lower barriers for local startups and researchers who lack the resources to collect data at scale. They can also reduce reliance on foreign APIs that rarely support African languages well, if at all.

Google and African universities launch the WAXAL dataset to train AI in 21 African languages, including Yoruba, Hausa and IgboThe WAXAL dataset

Still, datasets do not guarantee outcomes; building reliable voice systems requires sustained investment, local deployment, and commercial pathways that keep value in-country. Google’s role as funder and convenor will invite scrutiny, particularly around how WAXAL data is used by global companies in the future.

For now, the release of the WAXAL dataset marks a concrete step towards a more linguistically inclusive AI ecosystem. It does not solve Africa’s AI challenges, but it addresses a foundational one. Voice is often the most natural interface with technology. Making sure AI can hear Africa speak, in all its diversity, is long overdue.

The post Google to train AI in 21 African languages, including Yoruba, Hausa and Igbo first appeared on Technext.

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Telos Advisers Welcomes Stephen Gardner as a Strategic Advisory Board Member

Telos Advisers Welcomes Stephen Gardner as a Strategic Advisory Board Member

Former Amtrak CEO brings more than 25 years of leadership experience in rail, infrastructure delivery, and national transportation policy NEWARK, N.J.–(BUSINESS
Share
AI Journal2026/02/03 02:16
CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

The post CEO Sandeep Nailwal Shared Highlights About RWA on Polygon appeared on BitcoinEthereumNews.com. Polygon CEO Sandeep Nailwal highlighted Polygon’s lead in global bonds, Spiko US T-Bill, and Spiko Euro T-Bill. Polygon published an X post to share that its roadmap to GigaGas was still scaling. Sentiments around POL price were last seen to be bearish. Polygon CEO Sandeep Nailwal shared key pointers from the Dune and RWA.xyz report. These pertain to highlights about RWA on Polygon. Simultaneously, Polygon underlined its roadmap towards GigaGas. Sentiments around POL price were last seen fumbling under bearish emotions. Polygon CEO Sandeep Nailwal on Polygon RWA CEO Sandeep Nailwal highlighted three key points from the Dune and RWA.xyz report. The Chief Executive of Polygon maintained that Polygon PoS was hosting RWA TVL worth $1.13 billion across 269 assets plus 2,900 holders. Nailwal confirmed from the report that RWA was happening on Polygon. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 The X post published by Polygon CEO Sandeep Nailwal underlined that the ecosystem was leading in global bonds by holding a 62% share of tokenized global bonds. He further highlighted that Polygon was leading with Spiko US T-Bill at approximately 29% share of TVL along with Ethereum, adding that the ecosystem had more than 50% share in the number of holders. Finally, Sandeep highlighted from the report that there was a strong adoption for Spiko Euro T-Bill with 38% share of TVL. He added that 68% of returns were on Polygon across all the chains. Polygon Roadmap to GigaGas In a different update from Polygon, the community…
Share
BitcoinEthereumNews2025/09/18 01:10
ONDO Price Crashes 88% From All-Time Highs, But Analyst Says ‘Last Hope’ Zone Is Here

ONDO Price Crashes 88% From All-Time Highs, But Analyst Says ‘Last Hope’ Zone Is Here

The ONDO price has drifted into a part of the chart that usually gets traders paying attention. After months of downside, the price is now sitting inside a zone
Share
Captainaltcoin2026/02/03 02:30