Ishaan Ansari

I am a Software Engineer at Think Future Technologies Pvt. Ltd, part of the machine learning team, Where I work at the intersection of computer vision and language modelling.

Before this, I graduated with a Bachelor's degree in Computer Science and Engineering with Honors from Jamia Hamdard University, Delhi in 2023. As an undergrad I worked in machine learning with a focus on computer vision in healthcare under the supervision of Anam Saiyeda.

My eventual goal is to help build AI that can reliably and autonomously perform complex tasks for extremely long periods of time, without human intervention. Read below to learn more about my research interests and past work.

Email  /  GitHub  /  Google Scholar  /  Twitter  /  LinkedIn  /  CV

profile photo

Research

My research interests lie at the intersection of multimodal systems in vision and language domains, with a focus on enhancing reasoning and decision-making capabilities. I am particularly interested in addressing the challenges faced by large multi-modal models across discriminative, generative, and perceptual understanding tasks, especially in Out-Of-Distribution (OOD) and federated scenarios.

(* = equal contribution, † indicates my role as mentor)

Selected Projects

I worked on problems in interpretability, with a focus on large language models and multimodal systems.

project image

MIRAGE


code /

Multimodal RAG framework that integrates visual embeddings from medical images with retrieved clinical knowledge, leveraging dynamic prompt control to enhance factual precision and interpretability in medical reasoning tasks.

project image

GeoMorph


code /

Implemented a Pix2Pix GAN for mapping satellite/aerial images to it equivalent Map-View image

project image

Text guided image clustering


code /

Compared image clustering using visual, text-guided, and fine-tuned deep learning features on Food-101 data subset.

Other Projects

These include coursework, side projects and unpublished research work.

project image

History of Deep Learning


code /

It’s an ongoing project where I implement fundamental deep learning architectures from scratch.

project image

Machine Learning algorithms


code /

This repository contains implementations of classical machine learning algorithms from scratch.

project image

Captionix


code /

Image caption generator using CNN, LSTM & Attention mechanism to recognize the context of an image and describe them in natural language.

project image

Reinforcement Learning


code /

This repository contains various implementations of deep reinforcement learning algorithms accross different environments.

project image

AB Testing


code /

In this A/B test, We split the audience in half: the control group gets a Facebook campaign with ‘maximum bidding’ and the test group gets one with ‘average bidding’.

project image

LLMs from scratch


code /

A step-by-step implementation of a Large Language Model (LLM) from scratch, covering data preparation, model building, pretraining, and fine-tuning.

project image

Fine tuning LLMs


code /

A practical guide to fine-tuning various open-source LLMs such as LLaMA 2, Mistral, etc., using efficient techniques like LoRA and Quantization

project image

DeepSeek from scratch


code /

DeepSeek V3 introduces architectural improvements over traditional transformers which includes, Multi-Head Latent Attention (MLA), Mixture of Experts (MoE), & Multi-Token Prediction (MTP).

TensorTales

I started this project as a way to document and share my learnings. The goal of TensorTales is to make machine learning more intuitive for everyone.

  • Machine Learning primer

  • Deep learning algorithms

  • LLMs Roadmap

  • I'm continuously adding more content, so stay tuned for updates! If you have any suggestions or topics you'd like me to cover, feel free to reach out.

    Community

    I am actively mentoring undergraduate and master’s students in LLMs and Vision, and I look forward to supporting more learners. I especially encourage students with diverse backgrounds to connect and explore opportunities for growth and development. If you are interested, send an introductory email that includes:

    • A brief introduction about yourself.
    • Your academic background and areas of interest.
    • Your CV (optional but preferred).

    This site adapts design elements from Jon Barron's website