This commit is contained in:
2025-12-05 23:27:43 -05:00
parent 009584f497
commit 44901a44b7
8 changed files with 4847 additions and 196 deletions

124
ARABIC_TTS.md Normal file
View File

@@ -0,0 +1,124 @@
# Arabic Text-to-Speech Feature
## Overview
The bot now includes Arabic TTS responses using Coqui TTS, a lightweight and high-quality text-to-speech engine.
## Features
- **Verbal Responses**: Bot speaks in Arabic when executing commands
- **Lightweight Model**: Uses `tts_models/ar/cv/vits` - a fast VITS-based Arabic model
- **Automatic Fallback**: Falls back to pyttsx3 if Arabic TTS fails
## Verbal Responses
| Command | Arabic Response | Translation |
|---------|----------------|-------------|
| join | نعم، أنا هنا | "Yes, I am here" |
| leave | مع السلامة | "Goodbye" |
| play | حسناً | "Okay" |
| skip | التالي | "Next" |
| stop | توقف | "Stop" |
| unknown | ماذا تريد؟ | "What do you want?" |
## Configuration
### Environment Variables
```bash
# Enable/disable Arabic TTS
USE_ARABIC_TTS=true
# TTS model to use (default is lightweight Arabic VITS)
ARABIC_TTS_MODEL=tts_models/ar/cv/vits
```
### Available Arabic Models
Coqui TTS provides several Arabic models. The default is optimized for speed:
1. **tts_models/ar/cv/vits** (Default - Recommended)
- Fast inference
- Good quality
- Small model size (~50MB)
- Based on Common Voice dataset
2. **tts_models/ar/cv/glow-tts**
- Alternative model
- Slightly different voice characteristics
## Installation
The Arabic TTS is automatically installed with:
```bash
pip install TTS==0.22.0
```
On first run, the model will be downloaded automatically (~50MB).
## Usage
Once enabled, the bot will automatically speak responses when:
- Joining a voice channel
- Leaving a voice channel
- Playing, skipping, or stopping music
- Receiving unknown commands
No additional commands needed - it works automatically!
## Performance
- **Model Load Time**: ~2-3 seconds on first use
- **Inference Time**: ~0.5-1 second per response
- **Memory Usage**: ~200MB additional RAM
- **Disk Space**: ~50MB for model files
## Disabling Arabic TTS
To disable and use only English TTS:
```bash
USE_ARABIC_TTS=false
```
Or remove the environment variable entirely.
## Troubleshooting
### Model Download Fails
If the model fails to download:
1. Check internet connection
2. Manually download: `tts --model_name tts_models/ar/cv/vits --text "test"`
3. Models are cached in `~/.local/share/tts/`
### Audio Quality Issues
- Ensure FFmpeg is properly installed
- Check Discord voice bitrate settings
- Try a different model from the list above
### High CPU Usage
The VITS model is already optimized for CPU. If still too heavy:
1. Set `USE_ARABIC_TTS=false`
2. Use pyttsx3 fallback instead
3. Consider running on a more powerful machine
## Customization
To add more responses, edit `bot.py`:
```python
VERBAL_RESPONSES = {
"join": "نعم، أنا هنا",
"your_command": "your arabic text here",
}
```
Then add the response call:
```python
await speak_response(state.voice_client, "your_command")
```