Viewpoint invariant semantic object and scene categorization with RGB-D sensors
Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these p...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English English English |
Published: |
Springer New York LLC
2019
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf |
id |
iium-64696 |
---|---|
recordtype |
eprints |
spelling |
iium-646962019-08-01T02:25:54Z http://irep.iium.edu.my/64696/ Viewpoint invariant semantic object and scene categorization with RGB-D sensors Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal Q300 Cybernetics QA75 Electronic computers. Computer science Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin. Springer New York LLC 2019-04-01 Article PeerReviewed application/pdf en http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf application/pdf en http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf application/pdf en http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal (2019) Viewpoint invariant semantic object and scene categorization with RGB-D sensors. Autonomous Robots, 43 (4). pp. 1005-1022. ISSN 0929-5593 E-ISSN 1573-7527 (In Press) https://link.springer.com/journal/10514 10.1007/s10514-018-9776-8 |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
International Islamic University Malaysia |
building |
IIUM Repository |
collection |
Online Access |
language |
English English English |
topic |
Q300 Cybernetics QA75 Electronic computers. Computer science |
spellingShingle |
Q300 Cybernetics QA75 Electronic computers. Computer science Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal Viewpoint invariant semantic object and scene categorization with RGB-D sensors |
description |
Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin. |
format |
Article |
author |
Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal |
author_facet |
Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal |
author_sort |
Mohd Zaki, Hasan Firdaus |
title |
Viewpoint invariant semantic object and scene categorization with RGB-D sensors |
title_short |
Viewpoint invariant semantic object and scene categorization with RGB-D sensors |
title_full |
Viewpoint invariant semantic object and scene categorization with RGB-D sensors |
title_fullStr |
Viewpoint invariant semantic object and scene categorization with RGB-D sensors |
title_full_unstemmed |
Viewpoint invariant semantic object and scene categorization with RGB-D sensors |
title_sort |
viewpoint invariant semantic object and scene categorization with rgb-d sensors |
publisher |
Springer New York LLC |
publishDate |
2019 |
url |
http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/ http://irep.iium.edu.my/64696/20/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_complete.pdf http://irep.iium.edu.my/64696/19/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene_scopus.pdf http://irep.iium.edu.my/64696/31/64696_Viewpoint%20invariant%20semantic%20object%20and%20scene%20categorization%20with%20RGB-D%20sensors_WOS.pdf |
first_indexed |
2023-09-18T21:31:48Z |
last_indexed |
2023-09-18T21:31:48Z |
_version_ |
1777412560793894912 |