The Pipeline
VoiceraCX
AI Intern·June 2025 – Present
DAOStreet
Software Development Intern·Feb 2025 – June 2025
Alesa AI Ltd, UK
AI/ML Intern·Nov 2024 – Mar 2025
// projects
The Engine Room
Orpheus TTS
A language model where the 'language' is sound
2025
Why train a language model to output sound?
Orpheus TTS treats audio generation as a language modeling problem. Audio is encoded into discrete SNAC tokens, and the LLM learns to predict token sequences that decode back into speech. The 'language' the model speaks is sound — same transformer architecture, completely different modality.
Why LoRA instead of full fine-tuning?
Full fine-tuning a 3B parameter model requires significant GPU memory and risks catastrophic forgetting. LoRA trains only low-rank adapter matrices — a fraction of the parameters — making it possible to fine-tune on a single GPU in 19 minutes while preserving the base model's learned representations.
Why GGUF quantization?
The full model is 6.3GB. GGUF Q4_K_M quantization compresses it to ~2GB with minimal quality loss, enabling inference on consumer hardware. Combined with llama.cpp's optimized C++ runtime, this made CPU inference viable (slow, but functional) and GPU inference fast.
Why the SNAC codec specifically?
SNAC produces hierarchical discrete tokens from audio — multiple codebook levels at different resolutions. Each token position requires subtracting a different codebook offset (0, 4096, 8192, 12288, 16384, 20480, 24576). Without this offset correction, the model generates perfect silence. This was the hardest bug to find.
Why move from CPU to GPU?
CPU inference on a 3B model is a memory bandwidth problem, not a compute problem. Even with AVX2+FMA optimizations and GGUF quantization, CPU topped out at ~7.5 tokens/sec (~95 seconds per sentence). The T4 GPU on Lightning AI hit ~65 tokens/sec — roughly 30x faster, bringing generation down to ~3 seconds for short prompts.
Architecture
// audio sample
// skills
The Stack
Not a checklist. A system.
// lab
The Lab
Where I think out loud about systems, infrastructure, and AI.
// writing queue
Why I migrated from ChromaDB to Qdrant — and what broke
Hybrid dense-sparse search, production migration pain points
Designing a semantic cache that actually works
Similarity thresholds, cache invalidation, and the -60% API call reduction
Building for offline-first in Tier-2/3 India
Why CRDT sync beats PostgreSQL replication for schools with spotty connectivity
// signal
Signal Strength
// achievements
// connect
The Handshake
// currently open to opportunities