Spanish Emotional Text-To-Speech Synthesis


By: Alex Mares

Summary of best results and SOTA comparison
Our emotional TTS enables the HCI system to respond empathetically.

The model xtts-finetune-webui was used to fine-tune XTTS using the female Spanish speaker subset from the INTERFACE dataset (INTER1SP corpus). Fine-tuning targeted the acoustic decoder while keeping the multilingual backbone frozen, enabling efficient speaker adaptation with limited data. Read the full paper on MDPI

▶ Try the HCI Demo on Gradio View the Code on GitHub
← Back to Main