Spanish Emotional Text-To-Speech Synthesis

The model xtts-finetune-webui was used to fine-tune XTTS using the female Spanish speaker subset from the INTERFACE dataset (INTER1SP corpus). Fine-tuning targeted the acoustic decoder while keeping the multilingual backbone frozen, enabling efficient speaker adaptation with limited data. Read the full paper on MDPI