Spanish Speech Emotion Recognition (SER)
Meet the lastest SER results in Spanish.
Part 1. PTMs as feature extractors for Spanish SER

This study presents the first comparative evaluation of PTMs for Spanish SER across six models—Whisper, Wav2Vec 2.0, WavLM, HuBERT, TRILLsson, and CLAP—and six emotional speech datasets. Using a layer-wise feature extraction framework with Leave-One-Speaker-Out validation, our method outperforms prior benchmarks, reaching F1-scores of 88.32% (EmoMatchSpanishDB), 99.83% (INTER1SP), and 92.53% (MEACorpus).
Read the full paper on MDPI

Part 2. Multitask Learning SER System (Thesis)
This work introduces the first MTL approach for Spanish SER, trained on six diverse corpora. Using a frozen Wav2Vec2 XLSR encoder and an MLP classifier, the proposed system surpasses the single-task baseline by 2.37 WF1 points, reaching 90.56% in emotion classification, while also achieving near-perfect scores in speaker profiling (99.39%) and regional accent detection (99.91%).
