Georgia Tech’s Irfan Essa presents “Computational Video: Methods for Video Segmentation and Video Stabilization and Their Applications” as part of the IRIM Robotics Seminar Series. The seminar will be held in the Marcus Nanotechnology Building from 12-1 p.m. and is open to the public.
In this talk, I will present two specific methods for computational video and some thoughts about trends in this area in general.
First, I will describe a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowledge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal has been deployed on YouTube, resulting in successfully stabilizing millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer, running live on youtube.com
Second, I will describe an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a region graph over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video. This system is now available for use via the videosegmentation.com site. I will describe some applications of how this system is used for dynamic scene understanding.
This talk is based on efforts of research by Matthias Grundmann, Daniel Castro, and S. Hussain Raza, as part of their research efforts as students at Georgia Tech. Some parts of the work described were also completed with Matthias Grundmann, Vivek Kwatra, and Mei Han at Google Research.
Irfan Essa is the Associate Dean for Off-Campus and Special Initiatives in the College of Computing. He is also a professor in the School of Interactive Computing (IC) and an adjunct professor in the School of Electrical and Computer Engineering.
Essa works in the areas of Computer Vision, Computer Graphics, Computational Perception, Robotics and Computer Animation, Machine Learning, and Social Computing, with potential impact on Video Analysis and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, Artificial Intelligence, Computational Behavioral/Social Sciences, and Computational Journalism research.
The author of more than 150 scholarly articles in leading journals and conference venues, Essa has received several best paper awards. He has been awarded the NSF CAREER Award and was elected to the grade of IEEE Fellow. Also, he has held extended research consulting positions with Disney Research and Google Research and was an adjunct faculty member at Carnegie Mellon’s Robotics Institute.
He joined the Georgia Tech faculty in 1996 after holding a research faculty position at the Massachusetts Institute of Technology in the Media Lab for eight years.