Viewpoint invariant semantic object and scene categorization with RGB-D sensors

Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these p...

Full description

Bibliographic Details
Main Authors: Mohd Zaki, Hasan Firdaus, Shafait, Faisal, Mian, Ajmal
Format: Article
Language:English
English
English
Published: Springer New York LLC 2019
Subjects:
Online Access:http://irep.iium.edu.my/64696/
http://irep.iium.edu.my/64696/
http://irep.iium.edu.my/64696/
http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf
http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf
http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf
id iium-64696
recordtype eprints
spelling iium-646962019-08-01T02:25:54Z http://irep.iium.edu.my/64696/ Viewpoint invariant semantic object and scene categorization with RGB-D sensors Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal Q300 Cybernetics QA75 Electronic computers. Computer science Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin. Springer New York LLC 2019-04-01 Article PeerReviewed application/pdf en http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf application/pdf en http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf application/pdf en http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal (2019) Viewpoint invariant semantic object and scene categorization with RGB-D sensors. Autonomous Robots, 43 (4). pp. 1005-1022. ISSN 0929-5593 E-ISSN 1573-7527 (In Press) https://link.springer.com/journal/10514 10.1007/s10514-018-9776-8
repository_type Digital Repository
institution_category Local University
institution International Islamic University Malaysia
building IIUM Repository
collection Online Access
language English
English
English
topic Q300 Cybernetics
QA75 Electronic computers. Computer science
spellingShingle Q300 Cybernetics
QA75 Electronic computers. Computer science
Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal
Viewpoint invariant semantic object and scene categorization with RGB-D sensors
description Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin.
format Article
author Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal
author_facet Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal
author_sort Mohd Zaki, Hasan Firdaus
title Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_short Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_full Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_fullStr Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_full_unstemmed Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_sort viewpoint invariant semantic object and scene categorization with rgb-d sensors
publisher Springer New York LLC
publishDate 2019
url http://irep.iium.edu.my/64696/
http://irep.iium.edu.my/64696/
http://irep.iium.edu.my/64696/
http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf
http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf
http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf
first_indexed 2023-09-18T21:31:48Z
last_indexed 2023-09-18T21:31:48Z
_version_ 1777412560793894912