Ishaan Ansari
I am a Software Engineer at Think Future Technologies Pvt. Ltd, part of the machine learning team, Where I work at the intersection of
computer vision and language modelling.
Before this, I graduated with a Bachelor's degree in Computer Science and Engineering with Honors from Jamia Hamdard University, Delhi in 2023. As an undergrad I worked in machine
learning with a focus on computer vision in healthcare under the supervision of Anam Saiyeda.
My eventual goal is to help build AI that can reliably and autonomously perform complex tasks for extremely long periods of time, without human intervention. Read below to learn more about my research interests and past work.
Email /
GitHub /
Google Scholar /
Twitter /
LinkedIn /
CV
|
|
Research
My research interests lie at the intersection of multimodal systems in vision and language domains, with a focus on enhancing reasoning and decision-making capabilities. I am particularly interested in addressing the challenges faced by large multi-modal
models across discriminative, generative, and perceptual understanding tasks, especially in Out-Of-Distribution (OOD) and federated scenarios.
(* = equal contribution, † indicates my role as mentor)
|
Selected Projects
I worked on problems in interpretability, with a focus on large language models and multimodal systems.
|
|
MIRAGE
code /
Multimodal RAG framework that integrates visual embeddings from medical images with retrieved clinical knowledge, leveraging dynamic prompt control to enhance factual precision and interpretability in medical reasoning tasks.
|
|
GeoMorph
code /
Implemented a Pix2Pix GAN for mapping satellite/aerial images to it equivalent Map-View image
|
|
Text guided image clustering
code /
Compared image clustering using visual, text-guided, and fine-tuned deep learning features on Food-101 data subset.
|
Other Projects
These include coursework, side projects and unpublished research work.
|
|
History of Deep Learning
code /
It’s an ongoing project where I implement fundamental deep learning architectures from scratch.
|
|
Machine Learning algorithms
code /
This repository contains implementations of classical machine learning algorithms from scratch.
|
|
Captionix
code /
Image caption generator using CNN, LSTM & Attention mechanism to recognize the context of an image and describe them in natural language.
|
|
Reinforcement Learning
code /
This repository contains various implementations of deep reinforcement learning algorithms accross different environments.
|
|
AB Testing
code /
In this A/B test, We split the audience in half: the control group gets a Facebook campaign with ‘maximum bidding’ and the test group gets one with ‘average bidding’.
|
|
LLMs from scratch
code /
A step-by-step implementation of a Large Language Model (LLM) from scratch, covering data preparation, model building, pretraining, and fine-tuning.
|
|
Fine tuning LLMs
code /
A practical guide to fine-tuning various open-source LLMs such as LLaMA 2, Mistral, etc., using efficient techniques like LoRA and Quantization
|
|
DeepSeek from scratch
code /
DeepSeek V3 introduces architectural improvements over traditional transformers which includes, Multi-Head Latent Attention (MLA), Mixture of Experts (MoE), & Multi-Token Prediction (MTP).
|
TensorTales
I started this project as a way to document and share my learnings. The goal of TensorTales is to make machine learning more intuitive for everyone.
I'm continuously adding more content, so stay tuned for updates! If you have any suggestions or topics you'd like me to cover, feel free to reach out.
|
|