Technology company Meta has launched WAXAL, a new open dataset designed to advance speech technology development across Africa. The initiative aims to address a core challenge in creating voice enabled tools for the region: a lack of accessible, high quality speech data for local languages.
According to the announcement published on (January 02), Sub Saharan Africa is home to over 2,000 distinct languages. Meta stated that the absence of data for these languages has been a primary barrier to building inclusive voice technologies. The name WAXAL is derived from the Wolof word for “speak.”
The dataset includes over 11,000 hours of speech data drawn from nearly 2 million individual recordings across 21 African languages. The collection provides approximately 1,250 hours of transcribed speech for automatic speech recognition (ASR) and over 20 hours of studio recordings for text to speech (TTS) synthesis.
Languages featured include Acholi, Hausa, Luganda, Swahili, and Yoruba, among others. The company described the project as a collaborative effort built with African partners.
Key academic partners included Makerere University in Uganda and the University of Ghana, which led data collection for a combined 13 languages.
The Rwanda based firm Digital Umuganda headed collection for five major languages. High quality studio recordings were produced in partnership with Media Trust and Loud n Clear.
Meta emphasized an ethical data collection framework, stating that partner organizations retain ownership of the data they contributed. The methodology involved asking participants to describe pictures in their native languages to capture authentic speech patterns.
The company stated it hopes the dataset will fuel technological innovation and aid in the digital preservation of African languages. The complete WAXAL collection is released under an open license and is available for access on the platform Hugging Face.






















