June 28th, 2018 (Polytech “Amphi Nord”, in building Templiers 2)
Object class detection is a central area of computer vision. It requires recognizing and localizing all objects of predefined set of classes in an image. Detectors are usually trained under full supervision, which requires manually drawing object bounding-boxes in a large number of training images. This is tedious and very time consuming. In this talk I will present two recent techniques for reducing this effort.
In the first part I will explore a knowledge transfer scenario: training object detectors for target classes with only image-level labels, helped by a set of source classes with bounding-box annotations. I will present a unified knowledge transfer framework based on training a single neural network multi-class object detector over all source classes, organized in a semantic hierarchy. This generates proposals with scores at multiple levels in the hierarchy, which we use to explore knowledge transfer over a broad range of generality, ranging from class-specific (bicycle to motorbike) to class-generic (objectness to any class). Experiments on 200 object classes from the ILSVRC 2013 dataset demonstrate large improvements over weakly supervised baselines. Moreover, we also carry out several across-dataset knowledge transfer experiments, which establish the general applicability of our method
In the second part I will consider a human-machine collaboration scenario, where a human interacts with a computer model to carry the bounding-box annotation process together. I will introduce Intelligent Annotation Dialogs: we train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. We consider two actions: box verification, where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one based on predicting the probability that a box will be positively verified, and the other based on reinforcement learning. We experimentally demonstrate that our agents are able to learn efficient annotation strategies in several scenarios, automatically adapting to the image difficulty, the desired quality of the boxes, and the detector strength.
Vittorio Ferrari is a Research Scientist at Google and a Full Professor at the University of Edinburgh, leading a research group on visual learning in each institution. He received his PhD from ETH Zurich in 2004 and was a post-doctoral researcher at INRIA Grenoble in 2006-2007 and at the University of Oxford in 2007-2008. Between 2008 and 2012 he was an Assistant Professor at ETH Zurich, funded by a Swiss National Science Foundation Professorship grant. In 2012 he received the prestigious ERC Starting Grant, and the best paper award from the European Conference in Computer Vision. He is the author of over 100 technical publications. He regularly serves as an Area Chair for the major computer vision conferences, he will be a Program Chair for ECCV 2018 and a General Chair for ECCV 2020. He is an Associate Editor of IEEE Pattern Analysis and Machine Intelligence. His current research interests are in learning visual models with minimal human supervision, human-machine collaboration, and semantic segmentation.