This commit is contained in:
2025-12-05 23:27:43 -05:00
parent 009584f497
commit 44901a44b7
8 changed files with 4847 additions and 196 deletions

View File

@@ -4,3 +4,7 @@ LOG_LEVEL=INFO
TRANSCRIPT_LOG_ENABLED=true TRANSCRIPT_LOG_ENABLED=true
HOTWORD_ENABLED=true HOTWORD_ENABLED=true
GOODBOY_USER_ID=94578724413902848 GOODBOY_USER_ID=94578724413902848
# Arabic TTS Settings
USE_ARABIC_TTS=true
ARABIC_TTS_MODEL=tts_models/ar/cv/vits

124
ARABIC_TTS.md Normal file
View File

@@ -0,0 +1,124 @@
# Arabic Text-to-Speech Feature
## Overview
The bot now includes Arabic TTS responses using Coqui TTS, a lightweight and high-quality text-to-speech engine.
## Features
- **Verbal Responses**: Bot speaks in Arabic when executing commands
- **Lightweight Model**: Uses `tts_models/ar/cv/vits` - a fast VITS-based Arabic model
- **Automatic Fallback**: Falls back to pyttsx3 if Arabic TTS fails
## Verbal Responses
| Command | Arabic Response | Translation |
|---------|----------------|-------------|
| join | نعم، أنا هنا | "Yes, I am here" |
| leave | مع السلامة | "Goodbye" |
| play | حسناً | "Okay" |
| skip | التالي | "Next" |
| stop | توقف | "Stop" |
| unknown | ماذا تريد؟ | "What do you want?" |
## Configuration
### Environment Variables
```bash
# Enable/disable Arabic TTS
USE_ARABIC_TTS=true
# TTS model to use (default is lightweight Arabic VITS)
ARABIC_TTS_MODEL=tts_models/ar/cv/vits
```
### Available Arabic Models
Coqui TTS provides several Arabic models. The default is optimized for speed:
1. **tts_models/ar/cv/vits** (Default - Recommended)
- Fast inference
- Good quality
- Small model size (~50MB)
- Based on Common Voice dataset
2. **tts_models/ar/cv/glow-tts**
- Alternative model
- Slightly different voice characteristics
## Installation
The Arabic TTS is automatically installed with:
```bash
pip install TTS==0.22.0
```
On first run, the model will be downloaded automatically (~50MB).
## Usage
Once enabled, the bot will automatically speak responses when:
- Joining a voice channel
- Leaving a voice channel
- Playing, skipping, or stopping music
- Receiving unknown commands
No additional commands needed - it works automatically!
## Performance
- **Model Load Time**: ~2-3 seconds on first use
- **Inference Time**: ~0.5-1 second per response
- **Memory Usage**: ~200MB additional RAM
- **Disk Space**: ~50MB for model files
## Disabling Arabic TTS
To disable and use only English TTS:
```bash
USE_ARABIC_TTS=false
```
Or remove the environment variable entirely.
## Troubleshooting
### Model Download Fails
If the model fails to download:
1. Check internet connection
2. Manually download: `tts --model_name tts_models/ar/cv/vits --text "test"`
3. Models are cached in `~/.local/share/tts/`
### Audio Quality Issues
- Ensure FFmpeg is properly installed
- Check Discord voice bitrate settings
- Try a different model from the list above
### High CPU Usage
The VITS model is already optimized for CPU. If still too heavy:
1. Set `USE_ARABIC_TTS=false`
2. Use pyttsx3 fallback instead
3. Consider running on a more powerful machine
## Customization
To add more responses, edit `bot.py`:
```python
VERBAL_RESPONSES = {
"join": "نعم، أنا هنا",
"your_command": "your arabic text here",
}
```
Then add the response call:
```python
await speak_response(state.voice_client, "your_command")
```

100
VOICE_SETUP.md Normal file
View File

@@ -0,0 +1,100 @@
# Voice Receiving Setup Guide
## discord.py vs py-cord
Both **discord.py 2.0+** and **py-cord** support voice receiving through `discord.sinks`. Your bot now uses discord.py.
## Key Changes Made
1. **Switched to discord.py** - More actively maintained, better voice support
2. **Added opuslib** - Required for voice receiving on Windows
3. **Simplified connection logic** - Let the library handle reconnection internally
## Installation Steps
### 1. Install Opus (Windows)
```powershell
# Using Chocolatey (recommended)
choco install opus-tools -y
# Or download manually from:
# https://opus-codec.org/downloads/
```
### 2. Reinstall Python dependencies
```bash
pip uninstall py-cord discord.py -y
pip install -r requirements.txt
```
### 3. Set Opus path (if needed)
If Opus still doesn't load, add to your `.env`:
```
OPUS_LIB=C:\path\to\opus.dll
```
Common locations:
- `C:\ProgramData\chocolatey\lib\opus-tools\tools\opus.dll`
- `C:\Windows\System32\opus.dll`
## How Voice Receiving Works
### Recording Audio
```python
# Start recording (already in your bot)
voice_client.start_recording(sink, callback)
# Stop recording
voice_client.stop_recording()
```
### The Sink Pattern
Your `HotwordStreamSink` receives PCM audio data:
- **48kHz sample rate**
- **2 channels (stereo)**
- **16-bit PCM**
The sink's `write()` method is called continuously with audio chunks from each user.
## Troubleshooting
### Error 4006 (Session Invalid)
This happens when Discord thinks you're already connected. Fixed by:
- Proper cleanup before reconnecting
- Using `reconnect=True` in `channel.connect()`
- Waiting 1 second after disconnect
### No Audio Received
1. Check Opus is loaded: Look for "Loaded opus library" in logs
2. Verify bot has "Use Voice Activity" permission
3. Ensure users aren't muted
### High CPU Usage
The continuous transcription can be heavy. Consider:
- Increasing `min_chunk_seconds` in HotwordStreamSink
- Using a lighter STT model
- Only transcribing when volume threshold is met
## Testing
1. Start the bot: `python bot.py`
2. Join a voice channel
3. Say "hey bashar join" in text chat
4. Bot should join and start listening
5. Speak in voice - bot transcribes in real-time
## Alternative: discord.py Voice Recv
If you want even more control, check out:
https://github.com/imayhaveborkedit/discord-ext-voice-recv
This is a discord.py extension specifically for voice receiving.

4283
bot.log

File diff suppressed because it is too large Load Diff

450
bot.py
View File

@@ -15,20 +15,18 @@ from typing import Callable, Deque, Optional, Tuple
import discord import discord
import numpy as np import numpy as np
import pyttsx3 import pyttsx3
from TTS.api import TTS as CoquiTTS
import soundfile as sf import soundfile as sf
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
from discord import Intents from discord import Intents
from discord.errors import ClientException, ConnectionClosed from discord.errors import ClientException, ConnectionClosed
from discord.ext import voice_recv
from dotenv import load_dotenv from dotenv import load_dotenv
from yt_dlp import YoutubeDL from yt_dlp import YoutubeDL
from stt import transcribe_file from stt import transcribe_file
try: HAS_VOICE_RECV = True
from discord import sinks # Available in discord.py >=2.0 and py-cord
HAS_SINKS = True
except Exception:
HAS_SINKS = False
load_dotenv() load_dotenv()
@@ -65,6 +63,7 @@ if not _have_file:
# Tweak library log levels # Tweak library log levels
logging.getLogger("discord").setLevel(logging.INFO) logging.getLogger("discord").setLevel(logging.INFO)
logging.getLogger("aiohttp").setLevel(logging.INFO) logging.getLogger("aiohttp").setLevel(logging.INFO)
logging.getLogger("discord.ext.voice_recv.opus").setLevel(logging.ERROR) # Suppress packet loss warnings
logger = logging.getLogger("basharbot") logger = logging.getLogger("basharbot")
@@ -115,6 +114,16 @@ COMMAND_ALIASES = {
"next": "skip", "next": "skip",
} }
# Verbal responses (Arabic-friendly)
VERBAL_RESPONSES = {
"join": "نعم، أنا هنا", # "Yes, I am here"
"leave": "مع السلامة", # "Goodbye"
"play": "حسناً", # "Okay"
"skip": "التالي", # "Next"
"stop": "توقف", # "Stop"
"unknown": "ماذا تريد؟", # "What do you want?"
}
PCM_SAMPLE_RATE = 48000 PCM_SAMPLE_RATE = 48000
PCM_CHANNELS = 2 PCM_CHANNELS = 2
PCM_SAMPLE_WIDTH = 2 # bytes per sample PCM_SAMPLE_WIDTH = 2 # bytes per sample
@@ -123,6 +132,8 @@ TRANSCRIPT_LOG_ENABLED = os.getenv("TRANSCRIPT_LOG_ENABLED", "true").lower() in
TRANSCRIPT_LOG_PATH = os.getenv("TRANSCRIPT_LOG_PATH", "transcript.log") TRANSCRIPT_LOG_PATH = os.getenv("TRANSCRIPT_LOG_PATH", "transcript.log")
GOODBOY_USER_ID = int(os.getenv("GOODBOY_USER_ID", "94578724413902848")) GOODBOY_USER_ID = int(os.getenv("GOODBOY_USER_ID", "94578724413902848"))
GOODBOY_AUDIO_PATH = os.path.join(os.getcwd(), "goodboy.ogg") GOODBOY_AUDIO_PATH = os.path.join(os.getcwd(), "goodboy.ogg")
USE_ARABIC_TTS = os.getenv("USE_ARABIC_TTS", "true").lower() in {"1", "true", "yes", "on"}
ARABIC_TTS_MODEL = os.getenv("ARABIC_TTS_MODEL", "tts_models/ar/cv/vits")
def _display_name(user: object) -> str: def _display_name(user: object) -> str:
@@ -157,6 +168,39 @@ def is_probably_english_sentence(text: str) -> bool:
return bool(_ENGLISH_SENTENCE_RE.match(text)) return bool(_ENGLISH_SENTENCE_RE.match(text))
async def speak_response(voice_client: Optional[discord.VoiceClient], response_key: str) -> None:
"""Speak a verbal response in Arabic if enabled and connected to voice."""
if not voice_client or not voice_client.is_connected():
return
if not USE_ARABIC_TTS:
return
response_text = VERBAL_RESPONSES.get(response_key)
if not response_text:
return
try:
with tempfile.TemporaryDirectory() as tmpdir:
tts_path = os.path.join(tmpdir, "response.wav")
await synthesize_tts_to_wav(response_text, tts_path, use_arabic=True)
if voice_client.is_playing():
# Wait a bit if already playing
await asyncio.sleep(0.5)
source = discord.FFmpegPCMAudio(tts_path, **FFMPEG_OPTIONS)
fut = asyncio.get_running_loop().create_future()
def after_playback(_):
if not fut.done():
fut.set_result(True)
voice_client.play(source, after=after_playback)
await fut
except Exception as e:
logger.debug("Failed to speak response: %s", e)
async def announce_listening_roster(channel, voice_channel: Optional[discord.VoiceChannel]): async def announce_listening_roster(channel, voice_channel: Optional[discord.VoiceChannel]):
if channel is None or voice_channel is None: if channel is None or voice_channel is None:
return return
@@ -279,6 +323,7 @@ def make_tts_engine() -> pyttsx3.Engine:
_tts_engine_singleton: Optional[pyttsx3.Engine] = None _tts_engine_singleton: Optional[pyttsx3.Engine] = None
_arabic_tts_singleton: Optional[CoquiTTS] = None
_tts_executor: Optional[ThreadPoolExecutor] = None _tts_executor: Optional[ThreadPoolExecutor] = None
@@ -289,6 +334,21 @@ def get_tts_engine() -> pyttsx3.Engine:
return _tts_engine_singleton return _tts_engine_singleton
def get_arabic_tts() -> Optional[CoquiTTS]:
global _arabic_tts_singleton
if not USE_ARABIC_TTS:
return None
if _arabic_tts_singleton is None:
try:
logger.info("Loading Arabic TTS model: %s", ARABIC_TTS_MODEL)
_arabic_tts_singleton = CoquiTTS(model_name=ARABIC_TTS_MODEL, progress_bar=False, gpu=False)
logger.info("Arabic TTS model loaded successfully")
except Exception as e:
logger.error("Failed to load Arabic TTS model: %s", e)
return None
return _arabic_tts_singleton
def get_tts_executor() -> ThreadPoolExecutor: def get_tts_executor() -> ThreadPoolExecutor:
global _tts_executor global _tts_executor
if _tts_executor is None: if _tts_executor is None:
@@ -296,9 +356,22 @@ def get_tts_executor() -> ThreadPoolExecutor:
return _tts_executor return _tts_executor
async def synthesize_tts_to_wav(text: str, wav_path: str) -> str: async def synthesize_tts_to_wav(text: str, wav_path: str, use_arabic: bool = False) -> str:
"""Generate TTS to a WAV file using pyttsx3 in a background thread.""" """Generate TTS to a WAV file using Coqui TTS (Arabic) or pyttsx3 (English)."""
loop = asyncio.get_running_loop() loop = asyncio.get_running_loop()
if use_arabic and USE_ARABIC_TTS:
arabic_tts = get_arabic_tts()
if arabic_tts:
def _save_arabic():
logger.debug("Synthesizing Arabic TTS to %s: %s", wav_path, (text if len(text) < 120 else text[:117] + "..."))
arabic_tts.tts_to_file(text=text, file_path=wav_path)
await loop.run_in_executor(get_tts_executor(), _save_arabic)
logger.debug("Arabic TTS synthesis complete: %s", wav_path)
return wav_path
# Fallback to pyttsx3
engine = get_tts_engine() engine = get_tts_engine()
def _save(): def _save():
@@ -389,35 +462,31 @@ async def _get_active_voice_client(guild: Optional[discord.Guild]) -> Optional[d
return voice_client return voice_client
async def connect_voice_with_retry(channel: discord.abc.Connectable) -> discord.VoiceClient: async def connect_voice_with_retry(channel: discord.abc.Connectable) -> voice_recv.VoiceRecvClient:
""" """
Standard, simplified voice connection helper. Connect using VoiceRecvClient to enable voice receiving.
Uses standard Discord library methods without custom retry loops to avoid state conflicts.
""" """
guild: Optional[discord.Guild] = getattr(channel, "guild", None) guild: Optional[discord.Guild] = getattr(channel, "guild", None)
if guild is None: if guild is None:
raise RuntimeError("Voice channel without guild cannot establish a connection.") raise RuntimeError("Voice channel without guild cannot establish a connection.")
# 1. Cleanup existing client if present # Cleanup existing client if present
try: old_vc = getattr(guild, "voice_client", None)
old_vc = getattr(guild, "voice_client", None) if old_vc:
if old_vc: if old_vc.channel == channel and old_vc.is_connected():
if old_vc.channel == channel and old_vc.is_connected(): logger.debug("Already connected to target channel")
return old_vc return old_vc
try:
await old_vc.disconnect(force=True) await old_vc.disconnect(force=True)
await asyncio.sleep(0.5) await asyncio.sleep(1.0) # Give Discord time to clean up
except Exception as e: except Exception as e:
logger.debug("Error cleaning up old voice client: %s", e) logger.debug("Error cleaning up old voice client: %s", e)
# 2. Connect using standard library method # Connect with VoiceRecvClient to enable receiving
# Note: reconnect=True is the default and correct behavior for handling logger.info("Connecting to voice channel: %s", getattr(channel, "name", "?"))
# transient session errors (like 4006) internally by the library. voice_client = await channel.connect(cls=voice_recv.VoiceRecvClient, timeout=30.0, reconnect=True)
try: logger.info("Successfully connected to voice with VoiceRecvClient")
voice_client = await channel.connect(timeout=20.0, reconnect=True) return voice_client
return voice_client
except Exception as e:
logger.warning("Standard connect failed: %s", e)
raise
@dataclass @dataclass
class QueueItem: class QueueItem:
@@ -425,123 +494,116 @@ class QueueItem:
source_factory: Callable[[], discord.AudioSource] source_factory: Callable[[], discord.AudioSource]
announce: Optional[str] = None announce: Optional[str] = None
if HAS_SINKS: class HotwordStreamSink(voice_recv.AudioSink):
class HotwordStreamSink(sinks.Sink): def __init__(
def __init__( self,
self, state: "GuildAudioState",
state: "GuildAudioState", text_channel: discord.abc.Messageable,
text_channel: discord.abc.Messageable, loop: asyncio.AbstractEventLoop,
loop: asyncio.AbstractEventLoop, min_chunk_seconds: float = 1.0,
min_chunk_seconds: float = 1.0, window_seconds: float = 4.5,
window_seconds: float = 4.5, inactivity_seconds: float = 1.0,
inactivity_seconds: float = 1.0, ):
): super().__init__()
super().__init__() self.state = state
self.state = state self.text_channel = text_channel
self.text_channel = text_channel self.loop = loop
self.loop = loop self.closed = False
self.closed = False self.buffers: defaultdict[int, bytearray] = defaultdict(bytearray)
self.buffers: defaultdict[int, bytearray] = defaultdict(bytearray) self.last_activity: defaultdict[int, float] = defaultdict(lambda: 0.0)
self.last_activity: defaultdict[int, float] = defaultdict(lambda: 0.0) self.processing_users: set[int] = set()
self.processing_users: set[int] = set() self.pending_tasks: dict[int, concurrent.futures.Future] = {}
self.pending_tasks: dict[int, concurrent.futures.Future] = {} self.min_chunk_bytes = int(max(PCM_BYTES_PER_SECOND * min_chunk_seconds, PCM_BYTES_PER_SECOND * 0.5))
self.min_chunk_bytes = int(max(PCM_BYTES_PER_SECOND * min_chunk_seconds, PCM_BYTES_PER_SECOND * 0.5)) self.window_bytes = int(PCM_BYTES_PER_SECOND * window_seconds)
self.window_bytes = int(PCM_BYTES_PER_SECOND * window_seconds) self.inactivity_seconds = inactivity_seconds
self.inactivity_seconds = inactivity_seconds
def close(self): def wants_opus(self) -> bool:
self.closed = True # We want decoded PCM, not Opus packets
for fut in list(self.pending_tasks.values()): return False
def close(self):
self.closed = True
for fut in list(self.pending_tasks.values()):
try:
fut.cancel()
except Exception:
pass
self.pending_tasks.clear()
self.buffers.clear()
self.processing_users.clear()
def update_text_channel(self, channel: discord.abc.Messageable):
self.text_channel = channel
def cleanup(self):
self.close()
def write(self, user: discord.User, data: voice_recv.VoiceData):
if self.closed or user is None:
return
# Get PCM data from VoiceData
pcm_data = data.pcm
if not pcm_data:
return
user_id = user.id
buffer = self.buffers[user_id]
buffer.extend(pcm_data)
if len(buffer) > self.window_bytes:
del buffer[: len(buffer) - int(self.window_bytes)]
now = time.perf_counter()
self.last_activity[user_id] = now
if len(buffer) < self.min_chunk_bytes:
return
existing = self.pending_tasks.get(user_id)
if existing and not existing.done():
existing.cancel()
self.pending_tasks.pop(user_id, None)
async def delayed_dispatch(uid: int, expected_time: float):
try:
await asyncio.sleep(self.inactivity_seconds)
if self.closed:
return
last = self.last_activity.get(uid, 0.0)
if abs(last - expected_time) > 1e-6:
return
buffer = self.buffers.get(uid)
if not buffer or len(buffer) < self.min_chunk_bytes:
return
if uid in self.processing_users:
return
self.processing_users.add(uid)
chunk = bytes(buffer)
buffer.clear()
try: try:
fut.cancel() await self.state.handle_hotword_buffer(uid, chunk, self.text_channel)
except Exception: finally:
pass self.processing_users.discard(uid)
self.pending_tasks.clear() except asyncio.CancelledError:
self.buffers.clear() return
self.processing_users.clear() finally:
self.pending_tasks.pop(uid, None)
def update_text_channel(self, channel: discord.abc.Messageable): future = asyncio.run_coroutine_threadsafe(delayed_dispatch(user_id, now), self.loop)
self.text_channel = channel
def cleanup(self): def _done_callback(fut, uid=user_id):
self.closed = True if fut.cancelled():
for fut in list(self.pending_tasks.values()):
try:
fut.cancel()
except Exception:
pass
self.pending_tasks.clear()
return super().cleanup()
@sinks.Filters.container
def write(self, data, user):
if self.closed or user is None:
return return
try: try:
user_id = int(user) fut.result()
except Exception: except asyncio.CancelledError:
return return
except Exception as exc:
logger.exception("Hotword delayed dispatch failed for user %s: %s", uid, exc)
finally:
self.pending_tasks.pop(uid, None)
buffer = self.buffers[user_id] future.add_done_callback(_done_callback)
buffer.extend(data) self.pending_tasks[user_id] = future
if len(buffer) > self.window_bytes:
del buffer[: len(buffer) - int(self.window_bytes)]
now = time.perf_counter()
self.last_activity[user_id] = now
if len(buffer) < self.min_chunk_bytes:
return
existing = self.pending_tasks.get(user_id)
if existing and not existing.done():
existing.cancel()
self.pending_tasks.pop(user_id, None)
async def delayed_dispatch(uid: int, expected_time: float):
try:
await asyncio.sleep(self.inactivity_seconds)
if self.closed:
return
last = self.last_activity.get(uid, 0.0)
if abs(last - expected_time) > 1e-6:
return
buffer = self.buffers.get(uid)
if not buffer or len(buffer) < self.min_chunk_bytes:
return
if uid in self.processing_users:
return
self.processing_users.add(uid)
chunk = bytes(buffer)
buffer.clear()
try:
await self.state.handle_hotword_buffer(uid, chunk, self.text_channel)
finally:
self.processing_users.discard(uid)
except asyncio.CancelledError:
return
finally:
self.pending_tasks.pop(uid, None)
future = asyncio.run_coroutine_threadsafe(delayed_dispatch(user_id, now), self.loop)
def _done_callback(fut, uid=user_id):
if fut.cancelled():
return
try:
fut.result()
except asyncio.CancelledError:
return
except Exception as exc:
logger.exception("Hotword delayed dispatch failed for user %s: %s", uid, exc)
finally:
self.pending_tasks.pop(uid, None)
future.add_done_callback(_done_callback)
self.pending_tasks[user_id] = future
else:
class HotwordStreamSink: # type: ignore
def __init__(self, *args, **kwargs):
pass
@dataclass @dataclass
@@ -679,8 +741,8 @@ class GuildAudioState:
if not HOTWORD_ENABLED: if not HOTWORD_ENABLED:
logger.debug("Hotword listening disabled by environment (guild %s)", self.guild_id) logger.debug("Hotword listening disabled by environment (guild %s)", self.guild_id)
return return
if not HAS_SINKS: if not HAS_VOICE_RECV:
logger.warning("Hotword listening requested but sinks are unavailable on this stack.") logger.warning("Hotword listening requested but voice_recv is unavailable.")
try: try:
await text_channel.send("Live hotword listening is unavailable on this install. Send a voice message instead.") await text_channel.send("Live hotword listening is unavailable on this install. Send a voice message instead.")
except Exception: except Exception:
@@ -689,6 +751,10 @@ class GuildAudioState:
if not self.voice_client or not self.voice_client.is_connected(): if not self.voice_client or not self.voice_client.is_connected():
logger.debug("Cannot start listener without an active voice client (guild %s)", self.guild_id) logger.debug("Cannot start listener without an active voice client (guild %s)", self.guild_id)
return return
if not isinstance(self.voice_client, voice_recv.VoiceRecvClient):
logger.warning("Voice client is not VoiceRecvClient, cannot listen (guild %s)", self.guild_id)
return
self.listen_enabled = True self.listen_enabled = True
self.last_transcripts.clear() self.last_transcripts.clear()
if self.hotword_sink and not self.hotword_sink.closed: if self.hotword_sink and not self.hotword_sink.closed:
@@ -696,10 +762,10 @@ class GuildAudioState:
logger.debug("Hotword listener already running (guild %s)", self.guild_id) logger.debug("Hotword listener already running (guild %s)", self.guild_id)
return return
# If another recording is running, stop it first # If already listening, stop first
if getattr(self.voice_client, "recording", False): if self.voice_client.is_listening():
try: try:
self.voice_client.stop_recording() self.voice_client.stop_listening()
except Exception: except Exception:
pass pass
@@ -708,10 +774,7 @@ class GuildAudioState:
self.hotword_sink = sink self.hotword_sink = sink
logger.info("Starting continuous hotword listener (guild %s)", self.guild_id) logger.info("Starting continuous hotword listener (guild %s)", self.guild_id)
async def _finished_callback(sink_obj, *_): self.voice_client.listen(sink)
await self._on_sink_finished(sink_obj)
self.voice_client.start_recording(sink, _finished_callback)
channel = getattr(self.voice_client, "channel", None) channel = getattr(self.voice_client, "channel", None)
if channel: if channel:
@@ -737,11 +800,12 @@ class GuildAudioState:
sink = self.hotword_sink sink = self.hotword_sink
if sink: if sink:
sink.close() sink.close()
if self.voice_client and getattr(self.voice_client, "recording", False): if self.voice_client and isinstance(self.voice_client, voice_recv.VoiceRecvClient):
try: if self.voice_client.is_listening():
self.voice_client.stop_recording() try:
except Exception: self.voice_client.stop_listening()
pass except Exception:
pass
self.hotword_sink = None self.hotword_sink = None
async def handle_hotword_buffer(self, user_id: int, pcm_bytes: bytes, text_channel: discord.abc.Messageable): async def handle_hotword_buffer(self, user_id: int, pcm_bytes: bytes, text_channel: discord.abc.Messageable):
@@ -833,51 +897,39 @@ def get_state_for_guild(guild_id: int) -> GuildAudioState:
async def connect_to_author_channel(message: discord.Message) -> Optional[discord.VoiceClient]: async def connect_to_author_channel(message: discord.Message) -> Optional[discord.VoiceClient]:
if not isinstance(message.author, discord.Member): if not isinstance(message.author, discord.Member):
return None return None
logger.debug("Connect requested by %s in guild %s", message.author, getattr(message.guild, "id", "?"))
voice_state = message.author.voice voice_state = message.author.voice
if not voice_state or not voice_state.channel: if not voice_state or not voice_state.channel:
logger.info("Author not in a voice channel; cannot join (guild %s)", getattr(message.guild, "id", "?"))
await message.channel.send("Join a voice channel first, then say 'hey bashar join'.") await message.channel.send("Join a voice channel first, then say 'hey bashar join'.")
return None return None
channel = voice_state.channel channel = voice_state.channel
vc = await _get_active_voice_client(message.guild) guild = message.guild
# Check if already connected to the right channel
vc = guild.voice_client
if vc and vc.channel == channel and vc.is_connected(): if vc and vc.channel == channel and vc.is_connected():
logger.debug("Already connected to requested channel: %s (guild %s)", channel, getattr(message.guild, "id", "?")) logger.debug("Already connected to target channel")
return vc return vc
if vc:
try: # Move or reconnect
logger.info("Moving voice client to channel: %s (guild %s)", channel, getattr(message.guild, "id", "?")) try:
if vc and vc.is_connected():
logger.info("Moving to channel: %s", channel.name)
await vc.move_to(channel) await vc.move_to(channel)
await announce_listening_roster(message.channel, channel) else:
return vc if vc:
except Exception as e: await vc.disconnect(force=True)
logger.warning("Move failed; reconnecting fresh (guild %s): %s", getattr(message.guild, "id", "?"), e) await asyncio.sleep(1.0)
try:
await vc.disconnect(force=True)
except Exception:
pass
try:
vc = await connect_voice_with_retry(channel) vc = await connect_voice_with_retry(channel)
await announce_listening_roster(message.channel, channel)
except Exception as e: await announce_listening_roster(message.channel, channel)
logger.exception("Voice connect retries exhausted (guild %s): %s", getattr(message.guild, "id", "?"), e) return vc
await message.channel.send("I couldn't join the voice channel (error 4006). Try again in a few seconds.")
return None except Exception as e:
else: logger.exception("Failed to connect to voice: %s", e)
logger.info("Connecting to voice channel: %s (guild %s)", channel, getattr(message.guild, "id", "?")) await message.channel.send("Couldn't join voice channel. Try again in a moment.")
try: return None
vc = await connect_voice_with_retry(channel)
await announce_listening_roster(message.channel, channel)
except Exception as e:
logger.exception("Voice connect retries exhausted (guild %s): %s", getattr(message.guild, "id", "?"), e)
await message.channel.send("I couldn't join the voice channel (error 4006). Try again in a few seconds.")
return None
if vc and vc.is_connected():
logger.info("Connected to voice: %s (guild %s)", vc.channel, getattr(message.guild, "id", "?"))
else:
logger.error("Voice connect returned but not connected (guild %s)", getattr(message.guild, "id", "?"))
return vc
def make_ffmpeg_source(url: str) -> discord.AudioSource: def make_ffmpeg_source(url: str) -> discord.AudioSource:
@@ -967,10 +1019,10 @@ async def on_ready():
ensure_ffmpeg_available() ensure_ffmpeg_available()
ensure_opus_loaded() ensure_opus_loaded()
logger.info("Startup checks OK") logger.info("Startup checks OK")
if HOTWORD_ENABLED and HAS_SINKS: if HOTWORD_ENABLED and HAS_VOICE_RECV:
logger.info("Hotword listening: ENABLED (sinks available and HOTWORD_ENABLED=True)") logger.info("Hotword listening: ENABLED (voice_recv available and HOTWORD_ENABLED=True)")
elif HOTWORD_ENABLED and not HAS_SINKS: elif HOTWORD_ENABLED and not HAS_VOICE_RECV:
logger.info("Hotword listening: DISABLED (HOTWORD_ENABLED=True but sinks unavailable)") logger.info("Hotword listening: DISABLED (HOTWORD_ENABLED=True but voice_recv unavailable)")
else: else:
logger.info("Hotword listening: DISABLED (HOTWORD_ENABLED unset/false)") logger.info("Hotword listening: DISABLED (HOTWORD_ENABLED unset/false)")
@@ -1062,6 +1114,7 @@ async def on_message(message: discord.Message):
if vc: if vc:
state = get_state_for_guild(message.guild.id) state = get_state_for_guild(message.guild.id)
state.voice_client = vc state.voice_client = vc
await speak_response(vc, "join")
await message.channel.send("Joined your voice channel. Say 'hey bashar play <song>' here.") await message.channel.send("Joined your voice channel. Say 'hey bashar play <song>' here.")
logger.info("Joined voice channel for guild %s", message.guild.id) logger.info("Joined voice channel for guild %s", message.guild.id)
# Auto-start hotword listener # Auto-start hotword listener
@@ -1073,6 +1126,8 @@ async def on_message(message: discord.Message):
state = get_state_for_guild(message.guild.id) state = get_state_for_guild(message.guild.id)
await state.stop_listening() await state.stop_listening()
if state.voice_client and state.voice_client.is_connected(): if state.voice_client and state.voice_client.is_connected():
await speak_response(state.voice_client, "leave")
await asyncio.sleep(1.0) # Give time for goodbye to play
await message.channel.send("Leaving voice channel.") await message.channel.send("Leaving voice channel.")
logger.info("Disconnecting from voice (guild %s)", message.guild.id) logger.info("Disconnecting from voice (guild %s)", message.guild.id)
await state.voice_client.disconnect(force=True) await state.voice_client.disconnect(force=True)
@@ -1089,16 +1144,20 @@ async def on_message(message: discord.Message):
if action == "skip": if action == "skip":
state = get_state_for_guild(message.guild.id) state = get_state_for_guild(message.guild.id)
state.skip_current() state.skip_current()
await speak_response(state.voice_client, "skip")
await message.channel.send("Skipped the current track.") await message.channel.send("Skipped the current track.")
return return
if action == "stop": if action == "stop":
state = get_state_for_guild(message.guild.id) state = get_state_for_guild(message.guild.id)
state.stop_all() state.stop_all()
await speak_response(state.voice_client, "stop")
await message.channel.send("Stopped playback and cleared the queue.") await message.channel.send("Stopped playback and cleared the queue.")
return return
# Unknown # Unknown
state = get_state_for_guild(message.guild.id)
await speak_response(state.voice_client, "unknown")
await message.channel.send("Commands: 'hey bashar join', 'hey bashar play <song>', 'hey bashar skip', 'hey bashar stop', 'hey bashar leave'.") await message.channel.send("Commands: 'hey bashar join', 'hey bashar play <song>', 'hey bashar skip', 'hey bashar stop', 'hey bashar leave'.")
logger.debug("Sent help for unknown command") logger.debug("Sent help for unknown command")
@@ -1176,6 +1235,7 @@ async def route_transcribed_command_from_member(guild: discord.Guild, member: di
await text_channel.send("I couldn't join the voice channel (error 4006). Try again in a few seconds.") await text_channel.send("I couldn't join the voice channel (error 4006). Try again in a few seconds.")
return return
state.voice_client = vc state.voice_client = vc
await speak_response(vc, "join")
await text_channel.send("Joined your voice channel. Say 'hey bashar play <song>' here.") await text_channel.send("Joined your voice channel. Say 'hey bashar play <song>' here.")
# Start listening if not already # Start listening if not already
await state.start_listening(text_channel) await state.start_listening(text_channel)
@@ -1184,6 +1244,8 @@ async def route_transcribed_command_from_member(guild: discord.Guild, member: di
state = get_state_for_guild(guild.id) state = get_state_for_guild(guild.id)
await state.stop_listening() await state.stop_listening()
if state.voice_client and state.voice_client.is_connected(): if state.voice_client and state.voice_client.is_connected():
await speak_response(state.voice_client, "leave")
await asyncio.sleep(1.0)
await text_channel.send("Leaving voice channel.") await text_channel.send("Leaving voice channel.")
await state.voice_client.disconnect(force=True) await state.voice_client.disconnect(force=True)
return return
@@ -1191,18 +1253,24 @@ async def route_transcribed_command_from_member(guild: discord.Guild, member: di
if not args: if not args:
await text_channel.send("Say 'hey bashar play <search terms>'.") await text_channel.send("Say 'hey bashar play <search terms>'.")
return return
state = get_state_for_guild(guild.id)
await speak_response(state.voice_client, "play")
await handle_play_for_member(guild, member, text_channel, args) await handle_play_for_member(guild, member, text_channel, args)
return return
if action == "skip": if action == "skip":
state = get_state_for_guild(guild.id) state = get_state_for_guild(guild.id)
state.skip_current() state.skip_current()
await speak_response(state.voice_client, "skip")
await text_channel.send("Skipped the current track.") await text_channel.send("Skipped the current track.")
return return
if action == "stop": if action == "stop":
state = get_state_for_guild(guild.id) state = get_state_for_guild(guild.id)
state.stop_all() state.stop_all()
await speak_response(state.voice_client, "stop")
await text_channel.send("Stopped playback and cleared the queue.") await text_channel.send("Stopped playback and cleared the queue.")
return return
state = get_state_for_guild(guild.id)
await speak_response(state.voice_client, "unknown")
await text_channel.send("Commands: 'hey bashar join', 'hey bashar play <song>', 'hey bashar skip', 'hey bashar stop', 'hey bashar leave'.") await text_channel.send("Commands: 'hey bashar join', 'hey bashar play <song>', 'hey bashar skip', 'hey bashar stop', 'hey bashar leave'.")
@client.event @client.event

View File

@@ -14,16 +14,16 @@ services:
HOTWORD_ENABLED: ${HOTWORD_ENABLED:-true} HOTWORD_ENABLED: ${HOTWORD_ENABLED:-true}
GOODBOY_USER_ID: ${GOODBOY_USER_ID:-94578724413902848} GOODBOY_USER_ID: ${GOODBOY_USER_ID:-94578724413902848}
TRANSCRIPT_LOG_PATH: /app/logs/transcript.log TRANSCRIPT_LOG_PATH: /app/logs/transcript.log
USE_ARABIC_TTS: ${USE_ARABIC_TTS:-true}
ARABIC_TTS_MODEL: ${ARABIC_TTS_MODEL:-tts_models/ar/cv/vits}
volumes: volumes:
- bot-logs:/app/logs - ./logs:/app/logs
- bot-data:/app/data - bot-data:/app/data
- whisper-models:/root/.cache/huggingface - whisper-models:/root/.cache/huggingface
labels: labels:
- "com.centurylinklabs.watchtower.enable=true" - "com.centurylinklabs.watchtower.enable=true"
volumes: volumes:
bot-logs:
driver: local
bot-data: bot-data:
driver: local driver: local
whisper-models: whisper-models:

View File

@@ -1,10 +1,12 @@
py-cord>=2.4.0 discord.py[voice]>=2.6.4
discord-ext-voice-recv>=0.5.2a179
PyNaCl==1.5.0 PyNaCl==1.5.0
yt-dlp==2025.8.11 yt-dlp==2025.8.11
pyttsx3==2.90 pyttsx3==2.90
TTS==0.22.0
faster-whisper==1.0.3 faster-whisper==1.0.3
soundfile==0.12.1 soundfile==0.12.1
numpy==1.26.4 numpy>=1.22.0,<2.0
python-dotenv==1.0.1 python-dotenv==1.0.1

View File

@@ -1868,3 +1868,73 @@ hey bashar join
11/19/2025 21:30 anabolikn skywalker - hey bashar join 11/19/2025 21:30 anabolikn skywalker - hey bashar join
11/19/2025 21:30 anabolikn skywalker - hey bashar join 11/19/2025 21:30 anabolikn skywalker - hey bashar join
11/19/2025 21:33 anabolikn skywalker - hey bashar join 11/19/2025 21:33 anabolikn skywalker - hey bashar join
12/05/2025 22:43 Yahew - hey bashar join
12/05/2025 22:59 Yahew - hey bashar join
12/05/2025 22:59 Yahew - Ibishar, play.
12/05/2025 22:59 Yahew - Oh.
12/05/2025 22:59 Yahew - Hey, Bashar, play Congratulations, Congratulations, Afghan Taliban Song.
12/05/2025 23:21 Yahew - hey bashar join
12/05/2025 23:22 Rotund and Large and Plump Melon - First coordinate.
12/05/2025 23:22 Rotund and Large and Plump Melon - They're sponsored by Lenovo.
12/05/2025 23:22 Rotund and Large and Plump Melon - Thank you.
12/05/2025 23:22 Yahew - of Shara Play congratulations, congratulations Afghan Taliban song.
12/05/2025 23:22 Peanoats - Hunde OP, ja?
12/05/2025 23:22 Yahew - Play congratulations, congratulations Afghan Taliban song.
12/05/2025 23:23 Yahew - a very hard time picking up Bashar.
12/05/2025 23:23 Peanoats - Hey, but shark, kill yourself.
12/05/2025 23:23 Yahew - It's really important.
12/05/2025 23:23 Rotund and Large and Plump Melon - like a shark like Jihadi John bought.
12/05/2025 23:23 Yahew - Yes, like that that'll be that'll be even easier
12/05/2025 23:23 Nikodemos Based - improve to voice recognizer.
12/05/2025 23:23 Yahew - a congratulations, congratulations Afghan Taliban song.
12/05/2025 23:23 Rotund and Large and Plump Melon - You got this bonus you got this
12/05/2025 23:23 Nikodemos Based - I know I'm gapping them five seconds ahead.
12/05/2025 23:23 Nikodemos Based - If I fail now, it's all on me. There are no excuses anymore.
12/05/2025 23:23 Yahew - Yeah, so it got a Bishar
12/05/2025 23:23 Yahew - Congratulations, congratulations, Afghan Taliban song.
12/05/2025 23:23 Peanoats - Sorry, I didn't recognize your command.
12/05/2025 23:23 Yahew - Yeah, right?
12/05/2025 23:24 Peanoats - The temperature is 71 degrees Fahrenheit.
12/05/2025 23:24 Yahew - You know what, you know, it had to be working 100%. Look at my screen.
12/05/2025 23:24 Peanoats - Can be Luke. Can be Luke. Oh yeah, the chat log. Is the chat log running?
12/05/2025 23:24 Rotund and Large and Plump Melon - Ah, you're right, I'm the greatest leader.
12/05/2025 23:24 Nikodemos Based - Thank you.
12/05/2025 23:24 Rotund and Large and Plump Melon - Hey, can you understand?
12/05/2025 23:24 Peanoats - There it is! Ah, my dinner, John.
12/05/2025 23:24 Yahew - Thank you.
12/05/2025 23:24 Rotund and Large and Plump Melon - Did you understand that?
12/05/2025 23:24 Peanoats - Yeah
12/05/2025 23:24 Yahew - No, it did not understand Akman Tinejad.
12/05/2025 23:24 Peanoats - uh it hey butt shark
12/05/2025 23:24 Nikodemos Based - Thank you.
12/05/2025 23:24 Yahew - It's an act of aquintin and jod for me.
12/05/2025 23:24 Peanoats - It's all one word, yeah
12/05/2025 23:25 Peanoats - Hey, Bashar, leave!
12/05/2025 23:25 Rotund and Large and Plump Melon - like Iran or something.
12/05/2025 23:25 Yahew - hey bashar join
12/05/2025 23:25 Rotund and Large and Plump Melon - It might be your accent, Riggs.
12/05/2025 23:25 Peanoats - your accent. Hey Bishar, play? Yeah that.
12/05/2025 23:25 Yahew - our play, congratulations, congratulations Afghan Taliban song.
12/05/2025 23:25 Peanoats - Yeah. Hey, I'm Charlie.
12/05/2025 23:25 Yahew - No!
12/05/2025 23:25 Peanoats - Hey, I'm Charlie.
12/05/2025 23:25 Yahew - Hey Peshawar play congratulations, congratulations Afghan Taliban song.
12/05/2025 23:26 Peanoats - Maybe you have to pronounce it literally. Maybe you have to say basher.
12/05/2025 23:26 Nikodemos Based - basher bashed rigs on the head
12/05/2025 23:26 Rotund and Large and Plump Melon - Ha, ha, ha.
12/05/2025 23:26 Yahew - Hey, Peshwar. What?
12/05/2025 23:26 Nikodemos Based - Oh
12/05/2025 23:26 Rotund and Large and Plump Melon - Oh, the Peshmore province. Peshmore?
12/05/2025 23:26 Peanoats - leave
12/05/2025 23:26 Yahew - The sharp, bashed rig on the head. Where did you say that?
12/05/2025 23:26 Peanoats - I
12/05/2025 23:26 Nikodemos Based - I said bash rigs on the head.
12/05/2025 23:26 Yahew - The Peshmore.
12/05/2025 23:26 Peanoats - Hey, Bashar, leave.
12/05/2025 23:26 Yahew - hey bashar join
12/05/2025 23:26 Nikodemos Based - Thank you.
12/05/2025 23:26 Peanoats - Congratulations, Afghan Taliban song. Okay.
12/05/2025 23:26 Yahew - It's congratulations. It's congratulations. Congratulations.
12/05/2025 23:27 Peanoats - Congratulations, congratulations, Afghan Taliban song.
12/05/2025 23:27 Peanoats - P-note. Bruh.
12/05/2025 23:27 Yahew - Yeah, I gotta do some work.