Self-supervised and multilingual learning applied to the Wolof, Swahili and Fongbe

Prestilien Djionang Pindoh; Paulin Melatagia Yonta

Preprints, Working Papers, ... (Preprint) Year : 2024

Self-supervised and multilingual learning applied to the Wolof, Swahili and Fongbe

Apprentissage auto-supervisé et multilingue appliqué au Wolof, au Swahili et au Fongbe

(1) , (2, 1)

1
2

Prestilien Djionang Pindoh

Function : Author
PersonId : 1375210

Département d'Informatique [Yaoundé I]

Paulin Melatagia Yonta

Function : Author
PersonId : 981725
ORCID : 0000-0003-3479-2627
IdRef : 273745646

Unité de modélisation mathématique et informatique des systèmes complexes [Bondy]

Département d'Informatique [Yaoundé I]

Abstract

Under-resourced languages face significant challenges in speech recognition due to limited resources and data availability, hampering their development and usage. In this paper, we present a speech recognition model built upon existing frameworks based on self-supervised learning (Contrastive Predictive Coding (CPC), wav2vec and bidirectional version of CPC) by combining them with multilingual learning. This model is experimented on Wolof, Swahili, and Fongbe which are African languages. The results of our evaluation of representations on the automatic speech recognition task, using a similar architecture to DeepSpeech, highlight the model’s capability to discriminate language-specific linguistic features, achieving a Word Error Rate (WER) of 61% for Fongbe, 72% for Wolof and 88% for Swahili.

Les langues sous-dotées sont confrontées à des défis importants en matière de reconnaissance vocale en raison des ressources limitées et de la disponibilité des données, ce qui entrave leur développement et leur utilisation. Dans cet article, nous présentons un modèle de reconnaissance vocale construit à partir de cadres existants basés sur l'apprentissage auto-supervisé (Contrastive Predictive Coding (CPC), wav2vec et la version bidirectionnelle du CPC) en les combinant avec l'apprentissage multilingue. Ce modèle est expérimenté sur le wolof, le swahili et le fongbe qui sont des langues africaines. Les résultats de notre évaluation des représentations sur la tâche de reconnaissance automatique de la parole, en utilisant une architecture similaire à DeepSpeech, mettent en évidence la capacité du modèle à discriminer les caractéristiques linguistiques spécifiques à la langue, atteignant un taux d'erreur de mot (WER) de 61% pour le fongbe, 72% pour le wolof et 88% pour le swahili.

Keywords

Self-supervised learning Multilingual representation learning Automatic speech recognition Low endowed languages

Apprentissage auto-supervisé Apprentissage de représentations multilingues Reconnaissance automatique de la parole Langues faiblement dotées

Domains

Computer Science [cs]

Fichier principal

Papier_ARIMA.pdf (552.45 Ko)

Origin : Files produced by the author(s)

Prestilien Djionang Pindoh : Connect in order to contact the contributor

https://inria.hal.science/hal-04547298

Submitted on : Monday, April 15, 2024-4:35:56 PM

Last modification on : Friday, April 19, 2024-3:36:18 AM

Dates and versions

hal-04547298 , version 1 (15-04-2024)

Licence

Attribution

Identifiers

HAL Id : hal-04547298 , version 1

Cite

Prestilien Djionang Pindoh, Paulin Melatagia Yonta. Self-supervised and multilingual learning applied to the Wolof, Swahili and Fongbe. 2024. ⟨hal-04547298⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRD AFRIQ SORBONNE-UNIVERSITE SU-SCIENCES UMI-209

10 View

8 Download

Self-supervised and multilingual learning applied to the Wolof, Swahili and Fongbe

Apprentissage auto-supervisé et multilingue appliqué au Wolof, au Swahili et au Fongbe

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Share