Status: Systems_Optimal

I build systems
that think.

Exploring the frontier of AI, Infrastructure, and Systems Engineering. Architecting the bridges between neural logic and computational scale.

Initiate_Sequence

Open_Channel

Voice latency

14ms

Inference nodes

Model contexts

12.4M

memoryNode Alpha

99.98%

EFFICIENCY

streamFlux Rate

1.4 TB/s

STABLE

// Manifest_v5.0

The_ Architecture

Core_Engineering

High-Performance ML

Optimizing inference pipelines for sub-millisecond real-time applications. Bridging the gap between theory and production.

Systems_Design

Vector + RAG Infrastructure

Migrated ChromaDB → Qdrant with hybrid dense-sparse search. Added semantic caching and connection pooling across the orchestration layer.

Research_Frontier

Domain Fine-tuning

LoRA / PEFT adaptations on Llama 3.x for narrow, consumer-grade AI verticals. 10k+ term datasets, shipped weights.

// Interactive_Scroll · 06 Chapters

The_ Pipeline

Scroll through six beats of the voice-agent pipeline. Each chapter pins a node. The graph lights up as you move.

Chapter 01 / 06Voice_Capture

> 01 · capture_complete [...]

Frame size20ms

Chapter 01 / 06

01

// Capture_node

Voice_Capture

Raw PCM audio streams in over LiveKit. No buffering, no round-trip. Every 20ms frame is eligible for inference.

Frame size · 20ms

Chapter 02 / 06

02

// Transcribe_node

Stream_Transcription

Rolling ASR window; partial transcripts flush to the orchestrator before the speaker finishes the sentence.

First token · 180ms

Chapter 03 / 06

03

// Embed_node

Dense_Sparse_Embed

Each partial is embedded twice — dense for semantics, sparse for lexical recall. Hybrid rank, not reranked.

Dim · 768 + BM25

Chapter 04 / 06

04

// Retrieve_node

Qdrant_Retrieval

Migrated off ChromaDB. Qdrant hybrid search + semantic cache cuts retrieval latency by 90%. Pooled connections, pre-warmed shards.

Latency · −90%

Chapter 05 / 06

05

// Generate_node

LLM_Orchestration

Function-calling LLM fuses context, tools, and user intent. Cached prompts and speculative decoding shave another 60% off round-trip.

API calls · −60%

Chapter 06 / 06

06

// Respond_node

Voice_Synthesis

SNAC-codec TTS streams audio back within the same LiveKit session. End-to-end, input-to-speech: under 14ms ceiling.

E2E · 14ms

Manifest_v5.0

THE FORGE

Incubating neural architectures and decentralized systems. A gallery of synthetic intelligence.

Featured

Llama 3.2 3B2025

Orpheus TTS

A language model where the 'language' is sound

LLAMA 3.2 3BLORAUNSLOTHSNAC CODEC+4

19min1 epoch, 299 steps, single GPU

open_in_newView Project model ↗

FastAPIJuly

Smart Pathshala

Offline-first AI school ecosystem for Tier-2 and Tier-3 India

FASTAPIREACTPOSTGRESQLQDRANT+1

4Admin, Teacher, Parent, Student

lockPrivate

Live Demo

LLMApri

ShetNiyojan

Intelligent agricultural planning from seed to sale

LLMFLASKMONGODB

1stWinner at Devclash 2025, DY Patil

open_in_newView Project

Flask2025

WarCast

AI-based defense news aggregator with sentiment analysis and summaries

FLASKPYTHONDISTILBERTBART+2

10+Global defense publishers aggregated in real time

open_in_newView Project

BERTSumMarc

Legify

AI-powered legal document simplification and Q&A

BERTSUMFAISSDJANGOTTS/STT

1stWinner at Synapse 2.0, CCOEW

open_in_newView Project

5 projects

System_Status: Operational

THE MATRIX

A live map of the technical stack — from model training to production inference, data pipelines, and frontend delivery.

Neural_Nodes

Technical Proficiency Graph

psychology

AI / ML

8 modules

PythonPyTorchLlama 3.xLoRA / PEFTUnslothSNAC CodecBERTSumCrewAI

bolt

Inference & Serving

6 modules

llama.cppGGUF / Q4_K_MFastAPIFlaskLiveKitLLM Streaming

database

Data & Vector

6 modules

QdrantFAISSChromaDBPostgreSQLMongoDBRAG Pipelines

code_blocks

Frontend & Infra

6 modules

ReactNext.jsSvelteDjangoDockerLightning AI / GPU

Chronological_Expansion

The_Pipeline

sensors

VoiceraCX

AI Intern

June 2025 — Present

Voice agent platform with 3s response latency, ChromaDB bottleneck, redundant LLM API calls across pipeline.

−60%Latency

−90%Query time

−60%API calls

// PROCESS

Built real-time voice pipelines using LiveKit

Migrated vector storage from ChromaDB → Qdrant with hybrid dense-sparse search

Implemented LLM streaming inference, connection pooling, and smart caching

hub

DAOStreet

Software Development Intern

Feb 2025 — June 2025

Web application built on Svelte with open tickets across UI/UX and feature development.

ConsistentTickets

ImprovedUI quality

// PROCESS

Solved development tickets across the Svelte codebase

Debugged and enhanced UI/UX components for responsiveness

psychology

Alesa AI Ltd, UK

AI/ML Intern

Nov 2024 — Mar 2025

Astrology platform needing domain-specific AI for dream interpretation with fine-tuned language models.

10k+Dataset

Llama 3.1-8BModel

// PROCESS

Worked on Tangent Mind — tarot, horoscopes, dream interpretation platform

Fine-tuned Llama-3.1-8B using PEFT on 10,000+ dream-related terms dataset

Terminal / Transmission

Establish Connection

Send a message through the neural gateway or route through the authenticated social nodes. The form validates client-side and opens a prefilled message in your default mail client.

system_session_transmit

Pune Link Node

18.52°N / 73.86°E

Social Nodes

Neural_Status

LOC: Pune, Maharashtra
TZ: UTC+05:30
AVAILABILITY: High_Priority

I build systems that think.

High-Performance ML

Vector + RAG Infrastructure

Domain Fine-tuning

The_ Pipeline

Voice_Capture

Stream_Transcription

Dense_Sparse_Embed

Qdrant_Retrieval

LLM_Orchestration

Voice_Synthesis

THE FORGE

Orpheus TTS

Smart Pathshala

ShetNiyojan

WarCast

Legify

THE MATRIX

Neural_Nodes

The_Pipeline

VoiceraCX

DAOStreet

Alesa AI Ltd, UK

Pune Link Node

Social Nodes

I build systems
that think.