Modeling sub-event dynamics in first-person action recognition

First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We prop...

Full description

Bibliographic Details
Main Authors:	Mohd Zaki, Hasan Firdaus, Shafait, Faisal, Mian, Ajmal S.
Format:	Conference or Workshop Item
Language:	English English
Published:	IEEE 2017
Subjects:	TK7885 Computer engineering
Online Access:	http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf

id	iium-64353
recordtype	eprints
spelling	iium-643532018-07-05T06:56:36Z http://irep.iium.edu.my/64353/ Modeling sub-event dynamics in first-person action recognition Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. TK7885 Computer engineering First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets. IEEE 2017-11-09 Conference or Workshop Item PeerReviewed application/pdf en http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf application/pdf en http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal S. (2017) Modeling sub-event dynamics in first-person action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21st-26th July 2017, Honolulu, USA. https://ieeexplore.ieee.org/document/8099659/ 10.1109/CVPR.2017.176
repository_type	Digital Repository
institution_category	Local University
institution	International Islamic University Malaysia
building	IIUM Repository
collection	Online Access
language	English English
topic	TK7885 Computer engineering
spellingShingle	TK7885 Computer engineering Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. Modeling sub-event dynamics in first-person action recognition
description	First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.
format	Conference or Workshop Item
author	Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S.
author_facet	Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S.
author_sort	Mohd Zaki, Hasan Firdaus
title	Modeling sub-event dynamics in first-person action recognition
title_short	Modeling sub-event dynamics in first-person action recognition
title_full	Modeling sub-event dynamics in first-person action recognition
title_fullStr	Modeling sub-event dynamics in first-person action recognition
title_full_unstemmed	Modeling sub-event dynamics in first-person action recognition
title_sort	modeling sub-event dynamics in first-person action recognition
publisher	IEEE
publishDate	2017
url	http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf
first_indexed	2023-09-18T21:31:20Z
last_indexed	2023-09-18T21:31:20Z
_version_	1777412530715492352

Modeling sub-event dynamics in first-person action recognition

Similar Items