Ishaan Ansari

Research

My research interests lie at the intersection of multimodal systems in vision and language domains, with a focus on enhancing reasoning and decision-making capabilities. I am particularly interested in addressing the challenges faced by large multi-modal models across discriminative, generative, and perceptual understanding tasks, especially in Out-Of-Distribution (OOD) and federated scenarios.

(* = equal contribution, † indicates my role as mentor)

Selected Projects

I worked on problems in interpretability, with a focus on large language models and multimodal systems.

MIRAGE

code /

Multimodal RAG framework that integrates visual embeddings from medical images with retrieved clinical knowledge, leveraging dynamic prompt control to enhance factual precision and interpretability in medical reasoning tasks.

GeoMorph

code /

Implemented a Pix2Pix GAN for mapping satellite/aerial images to it equivalent Map-View image

Text guided image clustering

code /

Compared image clustering using visual, text-guided, and fine-tuned deep learning features on Food-101 data subset.

Other Projects

These include coursework, side projects and unpublished research work.

	History of Deep Learning code / It’s an ongoing project where I implement fundamental deep learning architectures from scratch.
	Machine Learning algorithms code / This repository contains implementations of classical machine learning algorithms from scratch.
	Captionix code / Image caption generator using CNN, LSTM & Attention mechanism to recognize the context of an image and describe them in natural language.
	Reinforcement Learning code / This repository contains various implementations of deep reinforcement learning algorithms accross different environments.
	AB Testing code / In this A/B test, We split the audience in half: the control group gets a Facebook campaign with ‘maximum bidding’ and the test group gets one with ‘average bidding’.
	LLMs from scratch code / A step-by-step implementation of a Large Language Model (LLM) from scratch, covering data preparation, model building, pretraining, and fine-tuning.
	Fine tuning LLMs code / A practical guide to fine-tuning various open-source LLMs such as LLaMA 2, Mistral, etc., using efficient techniques like LoRA and Quantization
	DeepSeek from scratch code / DeepSeek V3 introduces architectural improvements over traditional transformers which includes, Multi-Head Latent Attention (MLA), Mixture of Experts (MoE), & Multi-Token Prediction (MTP).

TensorTales

I started this project as a way to document and share my learnings. The goal of TensorTales is to make machine learning more intuitive for everyone.

Machine Learning primer

Deep learning algorithms

LLMs Roadmap

I'm continuously adding more content, so stay tuned for updates! If you have any suggestions or topics you'd like me to cover, feel free to reach out.

Community

I am actively mentoring undergraduate and master’s students in LLMs and Vision, and I look forward to supporting more learners. I especially encourage students with diverse backgrounds to connect and explore opportunities for growth and development. If you are interested, send an introductory email that includes:

A brief introduction about yourself.
Your academic background and areas of interest.
Your CV (optional but preferred).

This site adapts design elements from Jon Barron's website

Ishaan Ansari

Research

Selected Projects

MIRAGE

GeoMorph

Text guided image clustering

Other Projects

History of Deep Learning

Machine Learning algorithms

Captionix

Reinforcement Learning

AB Testing

LLMs from scratch

Fine tuning LLMs

DeepSeek from scratch

TensorTales

Machine Learning primer

Deep learning algorithms

LLMs Roadmap

Community