Ph.D. Thesis Proposal: Alireza Fathi
- May 3, 2012 10:00 am - 1:00 pm
- Atlanta, GA
Title: Learning to Recognize Daily Activities using Attention
School of Interactive Computing,
College of Computing,
Georgia Institute of Technology
Date: May 3rd (Thursday), 2012
Time: 10:00am - 1:00pm (EST)
Location: KACB 1116W
- Dr. James M. Rehg (Adviser, School of Interactive Computing, Georgia Tech)
- Dr. Gregory D. Abowd (School of Interactive Computing, Georgia Tech)
- Dr. Irfan Essa (School of Interactive Computing, Georgia Tech)
- Dr. Thad Starner (School of Interactive Computing, Georgia Tech)
- Dr. Martial Hebert (Robotics Institute, Carnegie Mellon University)
- Dr. Antonio Torralba (CSAIL, Massachusetts Institute of Technology)
The problem of modeling, analyzing and understanding daily activities from sensor data has important real-life applications such as elder care, development screening and augmented reality. In the U.S., persons aging 65 or older represent about 13% of the population. Most elders would prefer to continue to live in their own homes. However, unfortunately, the majority of elderly people gradually lose functioning capabilities and require additional assistance in the home. A framework that can monitor and understand human daily activities is a first step towards an automated care giving system that can assist elder people at home. There is a similar need for a daily activity recognition system that screens development of children in their living space. In the U.S., 50% of children with developmental disabilities lose an important window for early treatment because their conditions is not identified until they start school. Analyzing children's social behavior can help psychologists to recognize developmental disorders.
The standard approach to monitoring daily behaviors is based on instrumenting houses and offices with cameras and other sensors. However, the issues with such systems is that (1) the living space needs to be densely instrumented, which is not appealing to most people, (2) a person's body may occlude their activities and (3) important details of objects are hard to capture as a result of high distance to cameras. Given the recent availability of high-quality wearable cameras such as the Go-Pro and wearable eye-tracking glasses like Tobii and SMI, we propose to address the problem of recognizing daily activities by adopting the paradigm of egocentric or first-person video (i.e. video captured from a wearable camera that covers the scene directly in front of the user at all times). In contrast to the established third-person video paradigm, the egocentric paradigm makes it possible to easily collect examples of natural human behaviors from a restricted vantage point.
In egocentric vision, the body and mind of the subject serve as center of reference to all her interactions with the environment. As a result, egocentric vision bridges the gap between attention and action since it simultaneously captures where a particular subject fixates and what action she performs. It is clear from the psychology literature that humans' gaze is closely related to their action. However, utilizing gaze for learning to recognize daily activities is an unexplored territory. As a result, we propose to be the first who develops a framework that uses the close relationship between gaze and action to simultaneously predict the fixation locations and detect the actions in egocentric videos of daily living. We further propose to build an integrated framework that completely understands videos of egocentric daily activities by leveraging first-person attention to parse them into their elements.