Spanish Emotional Text-To-Speech Synthesis
The model xtts-finetune-webui was used to fine-tune XTTS using the female Spanish speaker subset from the INTERFACE dataset (INTER1SP corpus). Fine-tuning targeted the acoustic decoder while keeping the multilingual backbone frozen, enabling efficient speaker adaptation with limited data. Read the full paper on MDPI