A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation
Abstract
The automatic translation of spoken language into pictogram units can facilitate communication involving individuals
with language impairments. However, there is no established translation formalism or publicly available datasets for
training end-to-end speech translation systems. This paper introduces the first aligned speech, text, and pictogram
translation dataset ever created in any language. We provide a French dataset that contains 230 hours of speech
resources. We create a rule-based pictogram grammar with a restricted vocabulary and include a discussion of the
strategic decisions involved. It takes advantage of an in-depth linguistic study of resources taken from the ARASAAC
website. We validate these rules through multiple post-editing phases by expert annotators. The constructed dataset
is then used to experiment with a Speech-to-Pictogram cascade model, which employs state-of-the-art Automatic
Speech Recognition models. The dataset is freely available under a non-commercial licence. This marks a starting
point to conduct research into the automatic translation of speech into pictogram units.
Origin | Files produced by the author(s) |
---|