Modeling sub-event dynamics in first-person action recognition

First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We prop...

Full description

Bibliographic Details
Main Authors: Mohd Zaki, Hasan Firdaus, Shafait, Faisal, Mian, Ajmal S.
Format: Conference or Workshop Item
Language:English
English
Published: IEEE 2017
Subjects:
Online Access:http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf
http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf
id iium-64353
recordtype eprints
spelling iium-643532018-07-05T06:56:36Z http://irep.iium.edu.my/64353/ Modeling sub-event dynamics in first-person action recognition Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. TK7885 Computer engineering First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets. IEEE 2017-11-09 Conference or Workshop Item PeerReviewed application/pdf en http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf application/pdf en http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal S. (2017) Modeling sub-event dynamics in first-person action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21st-26th July 2017, Honolulu, USA. https://ieeexplore.ieee.org/document/8099659/ 10.1109/CVPR.2017.176
repository_type Digital Repository
institution_category Local University
institution International Islamic University Malaysia
building IIUM Repository
collection Online Access
language English
English
topic TK7885 Computer engineering
spellingShingle TK7885 Computer engineering
Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal S.
Modeling sub-event dynamics in first-person action recognition
description First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.
format Conference or Workshop Item
author Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal S.
author_facet Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal S.
author_sort Mohd Zaki, Hasan Firdaus
title Modeling sub-event dynamics in first-person action recognition
title_short Modeling sub-event dynamics in first-person action recognition
title_full Modeling sub-event dynamics in first-person action recognition
title_fullStr Modeling sub-event dynamics in first-person action recognition
title_full_unstemmed Modeling sub-event dynamics in first-person action recognition
title_sort modeling sub-event dynamics in first-person action recognition
publisher IEEE
publishDate 2017
url http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf
http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf
first_indexed 2023-09-18T21:31:20Z
last_indexed 2023-09-18T21:31:20Z
_version_ 1777412530715492352