125 lines
2.9 KiB
Markdown
125 lines
2.9 KiB
Markdown
# Arabic Text-to-Speech Feature
|
|
|
|
## Overview
|
|
|
|
The bot now includes Arabic TTS responses using Coqui TTS, a lightweight and high-quality text-to-speech engine.
|
|
|
|
## Features
|
|
|
|
- **Verbal Responses**: Bot speaks in Arabic when executing commands
|
|
- **Lightweight Model**: Uses `tts_models/ar/cv/vits` - a fast VITS-based Arabic model
|
|
- **Automatic Fallback**: Falls back to pyttsx3 if Arabic TTS fails
|
|
|
|
## Verbal Responses
|
|
|
|
| Command | Arabic Response | Translation |
|
|
|---------|----------------|-------------|
|
|
| join | نعم، أنا هنا | "Yes, I am here" |
|
|
| leave | مع السلامة | "Goodbye" |
|
|
| play | حسناً | "Okay" |
|
|
| skip | التالي | "Next" |
|
|
| stop | توقف | "Stop" |
|
|
| unknown | ماذا تريد؟ | "What do you want?" |
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Enable/disable Arabic TTS
|
|
USE_ARABIC_TTS=true
|
|
|
|
# TTS model to use (default is lightweight Arabic VITS)
|
|
ARABIC_TTS_MODEL=tts_models/ar/cv/vits
|
|
```
|
|
|
|
### Available Arabic Models
|
|
|
|
Coqui TTS provides several Arabic models. The default is optimized for speed:
|
|
|
|
1. **tts_models/ar/cv/vits** (Default - Recommended)
|
|
- Fast inference
|
|
- Good quality
|
|
- Small model size (~50MB)
|
|
- Based on Common Voice dataset
|
|
|
|
2. **tts_models/ar/cv/glow-tts**
|
|
- Alternative model
|
|
- Slightly different voice characteristics
|
|
|
|
## Installation
|
|
|
|
The Arabic TTS is automatically installed with:
|
|
|
|
```bash
|
|
pip install TTS==0.22.0
|
|
```
|
|
|
|
On first run, the model will be downloaded automatically (~50MB).
|
|
|
|
## Usage
|
|
|
|
Once enabled, the bot will automatically speak responses when:
|
|
- Joining a voice channel
|
|
- Leaving a voice channel
|
|
- Playing, skipping, or stopping music
|
|
- Receiving unknown commands
|
|
|
|
No additional commands needed - it works automatically!
|
|
|
|
## Performance
|
|
|
|
- **Model Load Time**: ~2-3 seconds on first use
|
|
- **Inference Time**: ~0.5-1 second per response
|
|
- **Memory Usage**: ~200MB additional RAM
|
|
- **Disk Space**: ~50MB for model files
|
|
|
|
## Disabling Arabic TTS
|
|
|
|
To disable and use only English TTS:
|
|
|
|
```bash
|
|
USE_ARABIC_TTS=false
|
|
```
|
|
|
|
Or remove the environment variable entirely.
|
|
|
|
## Troubleshooting
|
|
|
|
### Model Download Fails
|
|
|
|
If the model fails to download:
|
|
1. Check internet connection
|
|
2. Manually download: `tts --model_name tts_models/ar/cv/vits --text "test"`
|
|
3. Models are cached in `~/.local/share/tts/`
|
|
|
|
### Audio Quality Issues
|
|
|
|
- Ensure FFmpeg is properly installed
|
|
- Check Discord voice bitrate settings
|
|
- Try a different model from the list above
|
|
|
|
### High CPU Usage
|
|
|
|
The VITS model is already optimized for CPU. If still too heavy:
|
|
1. Set `USE_ARABIC_TTS=false`
|
|
2. Use pyttsx3 fallback instead
|
|
3. Consider running on a more powerful machine
|
|
|
|
## Customization
|
|
|
|
To add more responses, edit `bot.py`:
|
|
|
|
```python
|
|
VERBAL_RESPONSES = {
|
|
"join": "نعم، أنا هنا",
|
|
"your_command": "your arabic text here",
|
|
}
|
|
```
|
|
|
|
Then add the response call:
|
|
|
|
```python
|
|
await speak_response(state.voice_client, "your_command")
|
|
```
|