Modeling sub-event dynamics in first-person action recognition
First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We prop...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English English |
Published: |
IEEE
2017
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf |
id |
iium-64353 |
---|---|
recordtype |
eprints |
spelling |
iium-643532018-07-05T06:56:36Z http://irep.iium.edu.my/64353/ Modeling sub-event dynamics in first-person action recognition Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. TK7885 Computer engineering First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets. IEEE 2017-11-09 Conference or Workshop Item PeerReviewed application/pdf en http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf application/pdf en http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal S. (2017) Modeling sub-event dynamics in first-person action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21st-26th July 2017, Honolulu, USA. https://ieeexplore.ieee.org/document/8099659/ 10.1109/CVPR.2017.176 |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
International Islamic University Malaysia |
building |
IIUM Repository |
collection |
Online Access |
language |
English English |
topic |
TK7885 Computer engineering |
spellingShingle |
TK7885 Computer engineering Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. Modeling sub-event dynamics in first-person action recognition |
description |
First-person videos have unique characteristics such as
heavy egocentric motion, strong preceding events, salient
transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely
depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets. |
format |
Conference or Workshop Item |
author |
Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. |
author_facet |
Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. |
author_sort |
Mohd Zaki, Hasan Firdaus |
title |
Modeling sub-event dynamics in first-person action recognition |
title_short |
Modeling sub-event dynamics in first-person action recognition |
title_full |
Modeling sub-event dynamics in first-person action recognition |
title_fullStr |
Modeling sub-event dynamics in first-person action recognition |
title_full_unstemmed |
Modeling sub-event dynamics in first-person action recognition |
title_sort |
modeling sub-event dynamics in first-person action recognition |
publisher |
IEEE |
publishDate |
2017 |
url |
http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/ http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf |
first_indexed |
2023-09-18T21:31:20Z |
last_indexed |
2023-09-18T21:31:20Z |
_version_ |
1777412530715492352 |