profile_picture
Kaustav Kundu
Research Scientist, Meta Superintelligence Labs
kkundu10-at-gmail-dot-com

I am a Research Scientist at Meta Superintelligence Labs (MSL), where I work on large-scale video generation models. Previously, I was a Senior Applied Research Scientist at Amazon AWS, focusing on multimodal models trained with limited supervision, designed to generalize across diverse domains and reason effectively about their environment.

I completed my PhD (ABD) in Computer Science at the University of Toronto, advised by Professor Raquel Urtasun and Professor Sanja Fidler, where my thesis explored Efficient Search Strategies in 3D for Visual Scene Understanding. Earlier, I earned my MS at the Toyota Technological Institute at Chicago (2012–2013), also under the supervision of Professor Urtasun.

Research Interests: I am interested in advancing generative AI systems capable of spatio-temporal reasoning and fine-grained instruction following. My research lies at the intersection of video generation and multimodal understanding, exploring how models can synthesize realistic, temporally coherent, and instruction-aligned content grounded in the visual world while minimizing hallucination artifacts.

Academia

University of Toronto
2014 - 2017
Ph.D. (ABD) Computer Science
Advisors - Professor Raquel Urtasun and Professor Sanja Fidler.
Toyota Technological Insitute at Chicago
2012 - 2013
MS Computer Science
Advisors - Professor Raquel Urtasun
IIIT Hyderabad, India
2008 - 2012
B.Tech (Hons.) Computer Science and Engineering
Honours project at Centre for Visual Information Technology (CVIT) lab, advised by Professor P.J. Narayanan.

News

  • [07/25] Serving as an area chair at CVPR 2026.
  • [12/23] Our work on image generation was featured in multiple news articles - Bloomberg, TechCrunch, TheVerge, etc.
  • [06/21] Outstanding reviewer at CVPR.
  • [05/21] Our work on backward compaitibility was featured in TWIML podcast, BetterML blog, news.
  • [11/20] Outstanding reviewer at CVPR.
  • [06/17] Awarded Best Paper Honorable Mention award for our Polygon-RNN paper at CVPR.

Publications

Robustness Preserving Fine-tuning using Neuron Importance, 2024, ECCV
Guangrui Li , Rahul Duggal , Aaditya Singh , Bing Shuai , Kaustav Kundu , Jon Wu
Hierarchical Self-supervised Representation Learning for Movie Understanding, 2022, CVPR
Fanyi Xiao , Kaustav Kundu , Joseph Tighe , Davide Modolo
Id-Free Person Similarity Learning, 2022, CVPR
Bing Shuai , Xinyu Li , Kaustav Kundu , Joseph Tighe
TubeR: Tubelet Transformer for Video Action Detection, 2022, CVPR (oral)
Jiaojiao Zhao , Yanyi Zhang , Xinyu Li , Hao Chen , Bing Shuai , Mingze Xu , Chunhui Liu , Kaustav Kundu , Yuanjun Xiong , Davide Modolo , Ivan Marsic , Cees GM Snoek , Joseph Tighe
What to Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions, 2022, CVPR (oral)
ASM Iftekhar , Hao Chen , Kaustav Kundu , Xinyu Li , Joseph Tighe , Davide Modolo
Positive-congruent training: Towards regression-free model updates, 2021, CVPR (oral)
Sijie Yan , Yuanjun Xiong , Kaustav Kundu , Shuo Yang , Siqi Deng , Meng Wang , Wei Xia , Stefano Soatto
Exploiting weakly supervised visual patterns to learn from partial annotations, 2020, NeurIPS
Kaustav Kundu , Erhan Bas , Michael Lam , Hao Chen , Davide Modolo , Joseph Tighe
Pose Estimation for Objects with Rotational Symmetry, 2018, IROS
Enric Corona , Kaustav Kundu , Sanja Fidler
SurfConv: Bridging 3D and 2D Convolution for RGBD Images, 2018, CVPR
Hang Chu , Wei-Chiu Ma , Kaustav Kundu , Raquel Urtasun , Sanja Fidler
3D Object Proposals using Stereo Imagery for Accurate Object Class Detection, 2017, TPAMI
Xiaozhi Chen , Kaustav Kundu , Yukun Zhu , Humin Ma , Sanja Fidler , Raquel Urtasun.
Annotating Object Instances with a Polygon-RNN, 2017, CVPR (oral) [Best Paper Honorable Mention award]
Lluís Castrejón , Kaustav Kundu , Raquel Urtasun , Sanja Fidler
Exploiting Semantic Information and Deep Matching for Optical Flow, 2016, ECCV
Min Bai , Wenjie Luo , Kaustav Kundu , Raquel Urtasun
Monocular 3D Object Detection for Autonomous Driving, 2016, CVPR
Xiaozhi Chen , Kaustav Kundu , Ziyu Zhang , Humin Ma , Sanja Fidler , Raquel Urtasun
3D Object Proposals for Accurate Object Class Detection, 2015, Journal of Machine Learning
Xiaozhi Chen , Kaustav Kundu , Yukun Zhu , Andrew Berneshawi , Humin Ma , Sanja Fidler , Raquel Urtasun.
Rent3D: Floor-Plan Priors for Monocular Layout Estimation, 2015, CVPR (oral)
Chenxi Liu , Alexander Schwing , Kaustav Kundu , Raquel Urtasun , Sanja Fidler

Patents

System and method for vision-based event detection, 2024, US Patent 11869065
Jayan Eledath , Nikhil Chacko , Alessandro Bergamo , Kaustav Kundu , Marian George , Jingjing Liu , Nishit Desai , Pahal Dalal , Keshav Tripathi
Detecting interactions with non-discretized items and associating interactions with actors using digital images, 2023, US Patent 11580785
Kaustav Kundu , Pahal Dalal , Nishit Desai , Jayan Eledath , Geoffrey Franz , Gerard Medioni , Hoi Cheung Pang , Rakesh Ramakrishnan
Content moderation using object detection and image classification, 2022, US Patent 11423265
Hao Chen , Hao Wu , Hao Li , Michael Lam , Xinyu Li , Kaustav Kundu , Meng Wang , Joseph Tighe , Rahul Bhotika