Viewpoint invariant semantic object and scene categorization with RGB-D sensors

Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these p...

Full description

Bibliographic Details
Main Authors:	Mohd Zaki, Hasan Firdaus, Shafait, Faisal, Mian, Ajmal
Format:	Article
Language:	English English English
Published:	Springer New York LLC 2019
Subjects:	Q300 Cybernetics QA75 Electronic computers. Computer science
Online Access:	http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf

id	iium-64696
recordtype	eprints
spelling	iium-646962019-08-01T02:25:54Z http://irep.iium.edu.my/64696/ Viewpoint invariant semantic object and scene categorization with RGB-D sensors Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal Q300 Cybernetics QA75 Electronic computers. Computer science Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin. Springer New York LLC 2019-04-01 Article PeerReviewed application/pdf en http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf application/pdf en http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf application/pdf en http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal (2019) Viewpoint invariant semantic object and scene categorization with RGB-D sensors. Autonomous Robots, 43 (4). pp. 1005-1022. ISSN 0929-5593 E-ISSN 1573-7527 (In Press) https://link.springer.com/journal/10514 10.1007/s10514-018-9776-8
repository_type	Digital Repository
institution_category	Local University
institution	International Islamic University Malaysia
building	IIUM Repository
collection	Online Access
language	English English English
topic	Q300 Cybernetics QA75 Electronic computers. Computer science
spellingShingle	Q300 Cybernetics QA75 Electronic computers. Computer science Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal Viewpoint invariant semantic object and scene categorization with RGB-D sensors
description	Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin.
format	Article
author	Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal
author_facet	Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal
author_sort	Mohd Zaki, Hasan Firdaus
title	Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_short	Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_full	Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_fullStr	Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_full_unstemmed	Viewpoint invariant semantic object and scene categorization with RGB-D sensors
title_sort	viewpoint invariant semantic object and scene categorization with rgb-d sensors
publisher	Springer New York LLC
publishDate	2019
url	http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf
first_indexed	2023-09-18T21:31:48Z
last_indexed	2023-09-18T21:31:48Z
_version_	1777412560793894912

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

Similar Items