Godddam

2025-12-05 23:27:43 -05:00
parent 009584f497
commit 44901a44b7
8 changed files with 4847 additions and 196 deletions
--- a/ARABIC_TTS.md
+++ b/ARABIC_TTS.md
@@ -0,0 +1,124 @@
+# Arabic Text-to-Speech Feature
+
+## Overview
+
+The bot now includes Arabic TTS responses using Coqui TTS, a lightweight and high-quality text-to-speech engine.
+
+## Features
+
+- **Verbal Responses**: Bot speaks in Arabic when executing commands
+- **Lightweight Model**: Uses `tts_models/ar/cv/vits` - a fast VITS-based Arabic model
+- **Automatic Fallback**: Falls back to pyttsx3 if Arabic TTS fails
+
+## Verbal Responses
+
+| Command | Arabic Response | Translation |
+|---------|----------------|-------------|
+| join    | نعم، أنا هنا    | "Yes, I am here" |
+| leave   | مع السلامة     | "Goodbye" |
+| play    | حسناً          | "Okay" |
+| skip    | التالي         | "Next" |
+| stop    | توقف           | "Stop" |
+| unknown | ماذا تريد؟     | "What do you want?" |
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# Enable/disable Arabic TTS
+USE_ARABIC_TTS=true
+
+# TTS model to use (default is lightweight Arabic VITS)
+ARABIC_TTS_MODEL=tts_models/ar/cv/vits
+```
+
+### Available Arabic Models
+
+Coqui TTS provides several Arabic models. The default is optimized for speed:
+
+1. **tts_models/ar/cv/vits** (Default - Recommended)
+   - Fast inference
+   - Good quality
+   - Small model size (~50MB)
+   - Based on Common Voice dataset
+
+2. **tts_models/ar/cv/glow-tts**
+   - Alternative model
+   - Slightly different voice characteristics
+
+## Installation
+
+The Arabic TTS is automatically installed with:
+
+```bash
+pip install TTS==0.22.0
+```
+
+On first run, the model will be downloaded automatically (~50MB).
+
+## Usage
+
+Once enabled, the bot will automatically speak responses when:
+- Joining a voice channel
+- Leaving a voice channel
+- Playing, skipping, or stopping music
+- Receiving unknown commands
+
+No additional commands needed - it works automatically!
+
+## Performance
+
+- **Model Load Time**: ~2-3 seconds on first use
+- **Inference Time**: ~0.5-1 second per response
+- **Memory Usage**: ~200MB additional RAM
+- **Disk Space**: ~50MB for model files
+
+## Disabling Arabic TTS
+
+To disable and use only English TTS:
+
+```bash
+USE_ARABIC_TTS=false
+```
+
+Or remove the environment variable entirely.
+
+## Troubleshooting
+
+### Model Download Fails
+
+If the model fails to download:
+1. Check internet connection
+2. Manually download: `tts --model_name tts_models/ar/cv/vits --text "test"`
+3. Models are cached in `~/.local/share/tts/`
+
+### Audio Quality Issues
+
+- Ensure FFmpeg is properly installed
+- Check Discord voice bitrate settings
+- Try a different model from the list above
+
+### High CPU Usage
+
+The VITS model is already optimized for CPU. If still too heavy:
+1. Set `USE_ARABIC_TTS=false`
+2. Use pyttsx3 fallback instead
+3. Consider running on a more powerful machine
+
+## Customization
+
+To add more responses, edit `bot.py`:
+
+```python
+VERBAL_RESPONSES = {
+    "join": "نعم، أنا هنا",
+    "your_command": "your arabic text here",
+}
+```
+
+Then add the response call:
+
+```python
+await speak_response(state.voice_client, "your_command")
+```