Methods for Phonetic Scraping of Youtube Videos - Productions scientifiques du CLILLAC-ARP Access content directly
Conference Papers Year : 2023

Methods for Phonetic Scraping of Youtube Videos

Abstract

This paper discusses two pipelines for the auto- matic collection of automatic speech recognition (ASR) transcripts and audio content from YouTube videos and subsequent phonetic analysis: PEASYV (Phonetic Extraction and Alignment of Subtitled YouTube Videos) and YTPP (YouTube Phonetics Pipeline). The pipelines differ somewhat in terms of processing steps as well as the tools used for forced alignment, but produce comparable results. The two pipelines may be useful for large-scale collection of acoustic data for phonetic analysis.
Fichier principal
Vignette du fichier
2023.icnlsp-1.25.pdf (1.82 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-04547365 , version 1 (15-04-2024)

Licence

Attribution - NonCommercial - ShareAlike

Identifiers

  • HAL Id : hal-04547365 , version 1

Cite

Adrien Méli, Steven Coats, Nicolas Ballier. Methods for Phonetic Scraping of Youtube Videos. 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), Mourad Abbas, Abed Alhakim Freihat, Dec 2023, Trento (Italy), France. pp.244-249. ⟨hal-04547365⟩
9 View
2 Download

Share

Gmail Facebook X LinkedIn More