Bhooyas Kapadia

About Me

Hey, I’m Bhooyas — a curious human who makes models think and pixels behave. Whether I’m training transformers on a tight GPU budget or squeezing magic out of minimal resources, I blend code and creativity to build things that (usually) work and (sometimes) impress. I’m a passionate technologist who loves building smart, impactful solutions — from AI models and weird little tools to clean, intuitive web interfaces. When I’m not coding, I’m probably overengineering a side project, diving headfirst into new frameworks, or automating something that didn’t really need automating. Basically, if it involves logic, layers, or late-night debugging, I’m in.

Projects

🧠 GRPO on Small Language Models (0.5B)

Developed a synthetic algebra dataset generator to create verifiable linear equation solving tasks for controlled reasoning evaluation. Established SFT baseline and applied Generalized Reward Policy Optimization (GRPO) with continuous numeric reward shaping and KL-regularized policy updates. Achieved 98% accuracy versus 19% (SFT) and 0% (base model), demonstrating significant reasoning gains through reward-driven post-training under 4GB VRAM constraints. Conducted stability analysis via reward tracking and entropy monitoring to prevent policy collapse.

Huggingface Link

🦙 TinyLlama Fine-Tunning

Fine-tuned TinyLlama (1.1B) for improved instruction-following under strict hardware constraints (single NVIDIA GTX 1650 Ti, 4GB VRAM). Implemented parameter-efficient training using LoRA adapters, mixed precision (FP16), and gradient accumulation to remain within memory limits while maintaining stable convergence. Optimized LoRA rank and learning rate through controlled experiments on the Dolly 15k dataset, reducing trainable parameters by >99% compared to full fine-tuning. Achieved improved instruction coherence and structured response consistency.

Huggingface Link

StoryNet

Designed and implemented a custom transformer architecture from scratch using PyTorch, resulting in StoryNet—a specialized neural network for creative text generation. Successfully trained the model to generate coherent short stories, demonstrating expertise in transformer architecture, sequence modeling, and natural language generation.

ConditionalVAE

Implemented a Conditional Variational Autoencoder (ConditionalVAE) that generates handwritten digits based on specified class inputs, trained on the MNIST dataset. Demonstrated the ability to control the generation process through conditional parameters, highlighting expertise in deep generative modeling, latent space manipulation, and complex neural architecture design for controlled image synthesis.

AvatarGAN

Engineered AvatarGAN, an advanced image generation pipeline combining Deep Convolutional GAN (DCGAN) with Super-Resolution GAN (SRGAN) to create high-quality game character avatars. The system generates characters at multiple resolutions, with DCGAN handling initial character creation and SRGAN enhancing image quality through upscaling.

Experience

EDMO

Architected and delivered a multi-tenant Conversational AI platform supporting chat and real-time voice workflows, handling 100+ concurrent sessions with persistent session memory using Langgraph and Langchain.
Designed and implemented prompt versioning and automated evaluation pipelines, reducing prompt regression incidents by 50% and improving response consistency across 10+ prompt iterations.
Profiled end-to-end ML and inference pipelines to identify system bottlenecks, driving architecture-level optimizations across model serving, orchestration, and backend services.
Collaborated cross-functionally with backend and product teams to align model behavior with business requirements, reducing production defects by 30% and improving release stability.
Unblocked multiple client demos and releases by resolving critical ML and AI-system issues, contributing directly to successful stakeholder approvals and faster go-to-market timelines.
Developed a multi-tenant RAG pipeline with p95 latency under 1 second

Colaberry

Developed an automated benchmarking script for evaluating model performance across FastAPI, MLServer, and Triton, supporting multiple frameworks such as PyTorch, TensorFlow, and scikit-learn.
Developing Argo workflow template for automating batch model deployment, streamlining the deployment process.

Quantiphi

Containerized ml applications and deployed them on Kubernetes using helm charts on different cloud providers as well as on prem ecosystem.
Deployed a LLM model on Triton using Nemo Inference Microservice resulting in reducing latency by approximately 20%. Additionally, collaborated closely on deploying a RAG pipeline for suggesting drinks from the menu.
Worked closely in multiple workshops to optimize training time using technologies like Distributed Data Parallelism, Model Parallelism, Slurm and Enroot. The efforts resulted in reducing the training time from 30%-50% based on the use case.
Led the setup of a DGX Cluster comprising multiple DGX nodes and a SuperMicro as the headnode using Base Command Manager.
Converted models to TensorRT engines and deployed them on Triton Inference Server to server 10K users per day with latency of 5 secs per query.
Collaborated closely to mitigate VM vulnerabilities and keep the cost under track for GCP.
Additionally, I have utilized Prometheus and Grafana to monitor Docker utilization. I have also written a Bash script, resulting in a 30% reduction in effort.
Set up and managed multiple Kubernetes clusters with NGINX Ingress Controller to streamline external access to services, enhancing scalability and load balancing. Implemented Horizontal Pod Autoscaling (HPA) to support applications serving up to 100K users per day.
Additionally, responsible for scoping new projects, defining deliverables, and establishing timelines to ensure alignment with business objectives and resource availability

Skills

PyTorch, TensorFlow, Keras, Transfomers, Large Language Model (LLMs), LoRa, RAG, Prompt Engineering, Model Evaluation, ONNX.

Docker, Kubernetes, Helm, Triton Inference Server, TensorRT, NVIDIA NeMo, Slurm, Base Command Manager, MLflow, Terraform, Argo Workflows.

GCP, AWS, Azure, OCI, On-Prem DGX Infrastructure

Prometheus, Grafana, GitHub Actions, Azure DevOps

Python, Bash, Flask, gRPC, FastAPI, MySQL, Pandas, HTML, CSS, JavaScript

Reach Out

Can connect with me at any of the follwoing: -