Enhancement of stemming process for malay illicit web content

Web filtering system is one of the systems use to prevent users from can access any web pages that contain illicit contents. There are six (6) phases included in web filtering process. One of them is pre-processing phase. In this phase, there are three main activities included; HTML parsing, stem...

Full description

Bibliographic Details
Main Author: Noor Fatihah , Mazlam
Format: Thesis
Language:English
Published: 2012
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/9463/
http://umpir.ump.edu.my/id/eprint/9463/
http://umpir.ump.edu.my/id/eprint/9463/1/CD6530.pdf
id ump-9463
recordtype eprints
spelling ump-94632015-11-05T03:26:54Z http://umpir.ump.edu.my/id/eprint/9463/ Enhancement of stemming process for malay illicit web content Noor Fatihah , Mazlam TK Electrical engineering. Electronics Nuclear engineering Web filtering system is one of the systems use to prevent users from can access any web pages that contain illicit contents. There are six (6) phases included in web filtering process. One of them is pre-processing phase. In this phase, there are three main activities included; HTML parsing, stemming, and stopping. The main focus in this research is stemming process. Stemming process is used to remove any affixes that attached together in the input words from web pages to produce the correct root words. To date, the existing stemming algorithm in Malay language; Othman’s stemming algorithm and Sembok’s stemming algorithm still produce errors in the result. Hence, the errors from both stemming algorithm were analyzed. Few features were created to encounter the problems occurred in existing stemming algorithm. There are initial checking with dictionary, implementation of Rule 2 and also checking with additional dictionary that contains the illicit words not included in the initial dictionary. These new features were added in enhanced stemming algorithm.In order to check the effectiveness of the new features added in the enhanced stemming algorithm, few tests were done to the sample of web pages. Based from the test, the result shows that only 11% corrected words produced if the test is done by without checking with initial dictionary and 72% corrected words produced if the process starts with initial checking with dictionary. The result for the test for implementation of Rule 2 shows that by using Sembok’s algorithm it produced only 17% corrected words compared with enhanced stemming algorithm produced 62% corrected words. As conclusion, the implementation of new features in enhanced stemming algorithm can reduce the errors produce in Sembok’s stemming algorithm. 2012-08 Thesis NonPeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/9463/1/CD6530.pdf Noor Fatihah , Mazlam (2012) Enhancement of stemming process for malay illicit web content. Masters thesis, Universiti Teknologi Malaysia. http://iportal.ump.edu.my/lib/item?id=chamo:83662&theme=UMP2
repository_type Digital Repository
institution_category Local University
institution Universiti Malaysia Pahang
building UMP Institutional Repository
collection Online Access
language English
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Noor Fatihah , Mazlam
Enhancement of stemming process for malay illicit web content
description Web filtering system is one of the systems use to prevent users from can access any web pages that contain illicit contents. There are six (6) phases included in web filtering process. One of them is pre-processing phase. In this phase, there are three main activities included; HTML parsing, stemming, and stopping. The main focus in this research is stemming process. Stemming process is used to remove any affixes that attached together in the input words from web pages to produce the correct root words. To date, the existing stemming algorithm in Malay language; Othman’s stemming algorithm and Sembok’s stemming algorithm still produce errors in the result. Hence, the errors from both stemming algorithm were analyzed. Few features were created to encounter the problems occurred in existing stemming algorithm. There are initial checking with dictionary, implementation of Rule 2 and also checking with additional dictionary that contains the illicit words not included in the initial dictionary. These new features were added in enhanced stemming algorithm.In order to check the effectiveness of the new features added in the enhanced stemming algorithm, few tests were done to the sample of web pages. Based from the test, the result shows that only 11% corrected words produced if the test is done by without checking with initial dictionary and 72% corrected words produced if the process starts with initial checking with dictionary. The result for the test for implementation of Rule 2 shows that by using Sembok’s algorithm it produced only 17% corrected words compared with enhanced stemming algorithm produced 62% corrected words. As conclusion, the implementation of new features in enhanced stemming algorithm can reduce the errors produce in Sembok’s stemming algorithm.
format Thesis
author Noor Fatihah , Mazlam
author_facet Noor Fatihah , Mazlam
author_sort Noor Fatihah , Mazlam
title Enhancement of stemming process for malay illicit web content
title_short Enhancement of stemming process for malay illicit web content
title_full Enhancement of stemming process for malay illicit web content
title_fullStr Enhancement of stemming process for malay illicit web content
title_full_unstemmed Enhancement of stemming process for malay illicit web content
title_sort enhancement of stemming process for malay illicit web content
publishDate 2012
url http://umpir.ump.edu.my/id/eprint/9463/
http://umpir.ump.edu.my/id/eprint/9463/
http://umpir.ump.edu.my/id/eprint/9463/1/CD6530.pdf
first_indexed 2023-09-18T22:08:04Z
last_indexed 2023-09-18T22:08:04Z
_version_ 1777414841940574208