2.9 KiB
2.9 KiB
Arabic Text-to-Speech Feature
Overview
The bot now includes Arabic TTS responses using Coqui TTS, a lightweight and high-quality text-to-speech engine.
Features
- Verbal Responses: Bot speaks in Arabic when executing commands
- Lightweight Model: Uses
tts_models/ar/cv/vits- a fast VITS-based Arabic model - Automatic Fallback: Falls back to pyttsx3 if Arabic TTS fails
Verbal Responses
| Command | Arabic Response | Translation |
|---|---|---|
| join | نعم، أنا هنا | "Yes, I am here" |
| leave | مع السلامة | "Goodbye" |
| play | حسناً | "Okay" |
| skip | التالي | "Next" |
| stop | توقف | "Stop" |
| unknown | ماذا تريد؟ | "What do you want?" |
Configuration
Environment Variables
# Enable/disable Arabic TTS
USE_ARABIC_TTS=true
# TTS model to use (default is lightweight Arabic VITS)
ARABIC_TTS_MODEL=tts_models/ar/cv/vits
Available Arabic Models
Coqui TTS provides several Arabic models. The default is optimized for speed:
-
tts_models/ar/cv/vits (Default - Recommended)
- Fast inference
- Good quality
- Small model size (~50MB)
- Based on Common Voice dataset
-
tts_models/ar/cv/glow-tts
- Alternative model
- Slightly different voice characteristics
Installation
The Arabic TTS is automatically installed with:
pip install TTS==0.22.0
On first run, the model will be downloaded automatically (~50MB).
Usage
Once enabled, the bot will automatically speak responses when:
- Joining a voice channel
- Leaving a voice channel
- Playing, skipping, or stopping music
- Receiving unknown commands
No additional commands needed - it works automatically!
Performance
- Model Load Time: ~2-3 seconds on first use
- Inference Time: ~0.5-1 second per response
- Memory Usage: ~200MB additional RAM
- Disk Space: ~50MB for model files
Disabling Arabic TTS
To disable and use only English TTS:
USE_ARABIC_TTS=false
Or remove the environment variable entirely.
Troubleshooting
Model Download Fails
If the model fails to download:
- Check internet connection
- Manually download:
tts --model_name tts_models/ar/cv/vits --text "test" - Models are cached in
~/.local/share/tts/
Audio Quality Issues
- Ensure FFmpeg is properly installed
- Check Discord voice bitrate settings
- Try a different model from the list above
High CPU Usage
The VITS model is already optimized for CPU. If still too heavy:
- Set
USE_ARABIC_TTS=false - Use pyttsx3 fallback instead
- Consider running on a more powerful machine
Customization
To add more responses, edit bot.py:
VERBAL_RESPONSES = {
"join": "نعم، أنا هنا",
"your_command": "your arabic text here",
}
Then add the response call:
await speak_response(state.voice_client, "your_command")