An empirical comparative study of instance-based schema matching

The main issue concern of schema matching is how to support the merging decision by providing matching between attributes of different schemas. There have been many works in the literature toward utilizing database instances to detect the correspondence between attributes. Most of these previous wor...

Full description

Bibliographic Details
Main Authors: Alzeber, Mogahed, Aljuboori, Ali A.Alwan, Nordin, Azlin, Abualkishik, Abedallah Zaid
Format: Article
Language:English
English
Published: Institute of Advanced Engineering and Science (IAES) 2018
Subjects:
Online Access:http://irep.iium.edu.my/63238/
http://irep.iium.edu.my/63238/
http://irep.iium.edu.my/63238/
http://irep.iium.edu.my/63238/1/An%20Empirical%20Comparative%20Study%20of%20Instance-based%20Schema%20Matching_Published_Version.pdf
http://irep.iium.edu.my/63238/7/63238_An%20empirical%20comparative%20study%20of%20instance-based_scopus.pdf
Description
Summary:The main issue concern of schema matching is how to support the merging decision by providing matching between attributes of different schemas. There have been many works in the literature toward utilizing database instances to detect the correspondence between attributes. Most of these previous works aim at improving the match accuracy. We observed that no technique managed to provide an accurate matching for different types of data. In other words, some of the techniques treat numeric values as strings. Similarly, other techniques process textual instance, as numeric, and this negatively influences the process of discovering the match and compromising the matching result. Thus, a practical comparative study between syntactic and semantic techniques is needed. The study emphasizes on analyzing these techniques to determine the strengths and weaknesses of each technique. This paper aims at comparing two different instance-based matching techniques, namely: (i) regular expression and (ii) Google similarity to identify the match between attributes. Several analyses have been conducted on real and synthetic data sets to evaluate the performance of these techniques with respect to Precision (P), Recall (R) and F-Measure.