Kaustav Kundu

Research Scientist, Meta Superintelligence Labs

kkundu10-at-gmail-dot-com

I am a Research Scientist at Meta Superintelligence Labs (MSL), where I work on large-scale video generation models. Previously, I was a Senior Applied Research Scientist at Amazon AWS, focusing on multimodal models trained with limited supervision, designed to generalize across diverse domains and reason effectively about their environment.

I completed my PhD (ABD) in Computer Science at the University of Toronto, advised by Professor Raquel Urtasun and Professor Sanja Fidler, where my thesis explored Efficient Search Strategies in 3D for Visual Scene Understanding. Earlier, I earned my MS at the Toyota Technological Institute at Chicago (2012–2013), also under the supervision of Professor Urtasun.

Research Interests: I am interested in advancing generative AI systems capable of spatio-temporal reasoning and fine-grained instruction following. My research lies at the intersection of video generation and multimodal understanding, exploring how models can synthesize realistic, temporally coherent, and instruction-aligned content grounded in the visual world while minimizing hallucination artifacts.

Academia

University of Toronto

2014 - 2017

Ph.D. (ABD) Computer Science

Advisors - Professor Raquel Urtasun and Professor Sanja Fidler.

Toyota Technological Insitute at Chicago

2012 - 2013

MS Computer Science

Advisors - Professor Raquel Urtasun

IIIT Hyderabad, India

2008 - 2012

B.Tech (Hons.) Computer Science and Engineering

Honours project at Centre for Visual Information Technology (CVIT) lab, advised by Professor P.J. Narayanan.

News

[07/25] Serving as an area chair at CVPR 2026.
[12/23] Our work on image generation was featured in multiple news articles - Bloomberg, TechCrunch, TheVerge, etc.
[06/21] Outstanding reviewer at CVPR.
[05/21] Our work on backward compaitibility was featured in TWIML podcast, BetterML blog, news.
[11/20] Outstanding reviewer at CVPR.
[06/17] Awarded Best Paper Honorable Mention award for our Polygon-RNN paper at CVPR.

Publications

Robustness Preserving Fine-tuning using Neuron Importance, 2024, ECCV

Guangrui Li , Rahul Duggal , Aaditya Singh , Bing Shuai , Kaustav Kundu , Jon Wu

pdf

Hierarchical Self-supervised Representation Learning for Movie Understanding, 2022, CVPR

Fanyi Xiao , Kaustav Kundu , Joseph Tighe , Davide Modolo

pdf

Id-Free Person Similarity Learning, 2022, CVPR

Bing Shuai , Xinyu Li , Kaustav Kundu , Joseph Tighe

pdf

TubeR: Tubelet Transformer for Video Action Detection, 2022, CVPR (oral)

Jiaojiao Zhao , Yanyi Zhang , Xinyu Li , Hao Chen , Bing Shuai , Mingze Xu , Chunhui Liu , Kaustav Kundu , Yuanjun Xiong , Davide Modolo , Ivan Marsic , Cees GM Snoek , Joseph Tighe

pdf

What to Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions, 2022, CVPR (oral)

ASM Iftekhar , Hao Chen , Kaustav Kundu , Xinyu Li , Joseph Tighe , Davide Modolo

pdf

Positive-congruent training: Towards regression-free model updates, 2021, CVPR (oral)

Sijie Yan , Yuanjun Xiong , Kaustav Kundu , Shuo Yang , Siqi Deng , Meng Wang , Wei Xia , Stefano Soatto

pdf

Exploiting weakly supervised visual patterns to learn from partial annotations, 2020, NeurIPS

Kaustav Kundu , Erhan Bas , Michael Lam , Hao Chen , Davide Modolo , Joseph Tighe

pdf

Pose Estimation for Objects with Rotational Symmetry, 2018, IROS

Enric Corona , Kaustav Kundu , Sanja Fidler

pdf

SurfConv: Bridging 3D and 2D Convolution for RGBD Images, 2018, CVPR

Hang Chu , Wei-Chiu Ma , Kaustav Kundu , Raquel Urtasun , Sanja Fidler

pdf

3D Object Proposals using Stereo Imagery for Accurate Object Class Detection, 2017, TPAMI

Xiaozhi Chen , Kaustav Kundu , Yukun Zhu , Humin Ma , Sanja Fidler , Raquel Urtasun.

pdf

Annotating Object Instances with a Polygon-RNN, 2017, CVPR (oral) [Best Paper Honorable Mention award]

Lluís Castrejón , Kaustav Kundu , Raquel Urtasun , Sanja Fidler

pdf

Exploiting Semantic Information and Deep Matching for Optical Flow, 2016, ECCV

Min Bai , Wenjie Luo , Kaustav Kundu , Raquel Urtasun

pdf

Monocular 3D Object Detection for Autonomous Driving, 2016, CVPR

Xiaozhi Chen , Kaustav Kundu , Ziyu Zhang , Humin Ma , Sanja Fidler , Raquel Urtasun

pdf

3D Object Proposals for Accurate Object Class Detection, 2015, Journal of Machine Learning

Xiaozhi Chen , Kaustav Kundu , Yukun Zhu , Andrew Berneshawi , Humin Ma , Sanja Fidler , Raquel Urtasun.

pdf

Rent3D: Floor-Plan Priors for Monocular Layout Estimation, 2015, CVPR (oral)

Chenxi Liu , Alexander Schwing , Kaustav Kundu , Raquel Urtasun , Sanja Fidler

pdf

Patents

System and method for vision-based event detection, 2024, US Patent 11869065

Jayan Eledath , Nikhil Chacko , Alessandro Bergamo , Kaustav Kundu , Marian George , Jingjing Liu , Nishit Desai , Pahal Dalal , Keshav Tripathi

link

Detecting interactions with non-discretized items and associating interactions with actors using digital images, 2023, US Patent 11580785

Kaustav Kundu , Pahal Dalal , Nishit Desai , Jayan Eledath , Geoffrey Franz , Gerard Medioni , Hoi Cheung Pang , Rakesh Ramakrishnan

link

Content moderation using object detection and image classification, 2022, US Patent 11423265

Hao Chen , Hao Wu , Hao Li , Michael Lam , Xinyu Li , Kaustav Kundu , Meng Wang , Joseph Tighe , Rahul Bhotika

link