Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

3

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Tue Nov 01 2016

Journal Name

Research Journal Of Pharmaceutical, Biological And Chemical Sciences

Treating of oil-based drill cuttings by earthworms

Bio treatment

Drill cutting

Earthworms

Environmental Protection

AA

Khalid M.

...Show More Authors

This study assessed the advantage of using earthworms in combination with punch waste and nutrients in remediating drill cuttings contaminated with hydrocarbons. Analyses were performed on day 0, 7, 14, 21, and 28 of the experiment. Two hydrocarbon concentrations were used (20000 mg/kg and 40000 mg/kg) for three groups of earthworms number which were five, ten and twenty earthworms. After 28 days, the total petroleum hydrocarbon (TPH) concentration (20000 mg/kg) was reduced to 13200 mg/kg, 9800 mg/kg, and 6300 mg/kg in treatments with five, ten and twenty earthworms respectively. Also, TPH concentration (40000 mg/kg) was reduced to 22000 mg/kg, 10100 mg/kg, and 4200 mg/kg in treatments with the above number of earthworms respectively. The p

View Publication

Publication Date

Mon May 15 2017

Journal Name

International Journal Of Image And Data Fusion

Image edge detection operators based on orthogonal polynomials

Sadiq H.

Abd. Rahman

Basheera M.

S.A.R.

Wissam A.

...Show More Authors

View Publication

(32)

(10)

Publication Date

Tue Jan 01 2013

Journal Name

International Journal Of Computer Applications

Content-based Image Retrieval (CBIR) using Hybrid Technique

CBIR

feature extraction

properties

color histogram

GLCM

hybrid

similarity measure

Zainab

Israa

Nabeel

...Show More Authors

Image retrieval is used in searching for images from images database. In this paper, content – based image retrieval (CBIR) using four feature extraction techniques has been achieved. The four techniques are colored histogram features technique, properties features technique, gray level co- occurrence matrix (GLCM) statistical features technique and hybrid technique. The features are extracted from the data base images and query (test) images in order to find the similarity measure. The similarity-based matching is very important in CBIR, so, three types of similarity measure are used, normalized Mahalanobis distance, Euclidean distance and Manhattan distance. A comparison between them has been implemented. From the results, it is conclud

View Publication

Publication Date

Mon Mar 01 2021

Journal Name

Iop Conference Series: Materials Science And Engineering

Speech Enhancement Algorithm Based on a Hybrid Estimator

Basheera M.

Sadiq H.

Marwah A.

Muntadher

Jamila

...Show More Authors

Abstract<p>Speech is the essential way to interact between humans or between human and machine. However, it is always contaminated with different types of environment noise. Therefore, speech enhancement algorithms (SEA) have appeared as a significant approach in speech processing filed to suppress background noise and return back the original speech signal. In this paper, a new efficient two-stage SEA with low distortion is proposed based on minimum mean square error sense. The estimation of clean signal is performed by taking the advantages of Laplacian speech and noise modeling based on orthogonal transform (Discrete Krawtchouk-Tchebichef transform) coefficients distribution. The Discrete Kra</p> ... Show More

View Publication

(11)

Publication Date

Tue Feb 01 2022

Journal Name

Int. J. Nonlinear Anal. Appl.

Computer-based plagiarism detection techniques: A comparative study

Plagiarism

Academic

Detection

Dataset

Pan

Mohammed

...Show More Authors

Plagiarism is becoming more of a problem in academics. It’s made worse by the ease with which a wide range of resources can be found on the internet, as well as the ease with which they can be copied and pasted. It is academic theft since the perpetrator has ”taken” and presented the work of others as his or her own. Manual detection of plagiarism by a human being is difficult, imprecise, and time-consuming because it is difficult for anyone to compare their work to current data. Plagiarism is a big problem in higher education, and it can happen on any topic. Plagiarism detection has been studied in many scientific articles, and methods for recognition have been created utilizing the Plagiarism analysis, Authorship identification, and

Publication Date

Tue Feb 28 2023

Journal Name

International Journal Of Intelligent Engineering And Systems

Design and Implementation of EEG-Based Smart Structure

Oger Zaya

Yarub

...Show More Authors

View Publication

(6)

(1)

Publication Date

Thu Aug 01 2019

Journal Name

2019 2nd International Conference On Engineering Technology And Its Applications (iiceta)

Human Gait Identification System Based on Average Silhouette

Mohanad Hazim Nsaif

Nawaf Hazim

Sinan Sameer Mahmood

...Show More Authors

View Publication

(1)

Publication Date

Wed Sep 01 2021

Journal Name

Baghdad Science Journal

Optimum Median Filter Based on Crow Optimization Algorithm

Image processing

Impulse noise

Noise removal

Optimum median filter

Crow optimization algorithm.

Basma Jumaa

Ahmed Yousif Falih

Ali Talib Qasim

Lamees abdalhasan

...Show More Authors

A novel median filter based on crow optimization algorithms (OMF) is suggested to reduce the random salt and pepper noise and improve the quality of the RGB-colored and gray images. The fundamental idea of the approach is that first, the crow optimization algorithm detects noise pixels, and that replacing them with an optimum median value depending on a criterion of maximization fitness function. Finally, the standard measure peak signal-to-noise ratio (PSNR), Structural Similarity, absolute square error and mean square error have been used to test the performance of suggested filters (original and improved median filter) used to removed noise from images. It achieves the simulation based on MATLAB R2019b and the resul

View Publication Preview PDF

(8)

(4)

Publication Date

Fri Mar 01 2019

Journal Name

Al-khwarizmi Engineering Journal

COMPUTER-BASED ECG SIGNAL ANALYSIS AND MONITORING SYSTEM

Hadeel Kassim

Nasser N.

...Show More Authors

This paper deals with the design and implementation of an ECG system. The proposed system gives a new concept of ECG signal manipulation, storing, and editing. It consists mainly of hardware circuits and the related software. The hardware includes the circuits of ECG signals capturing, and system interfaces. The software is written using Visual Basic languages, to perform the task of identification of the ECG signal. The main advantage of the system is to provide a reported ECG recording on a personal computer, so that it can be stored and processed at any time as required. This system was tested for different ECG signals, some of them are abnormal and the other is normal, and the results show that the system has a good quality of diagno

View Publication Preview PDF

Publication Date

Thu Dec 01 2022

Journal Name

Al-khwarizmi Engineering Journal

BCI-Based Smart Room Control using EEG Signals

Oger Zaya

Yarub

...Show More Authors

In this paper, we implement and examine a Simulink model with electroencephalography (EEG) to control many actuators based on brain waves. This will be in great demand since it will be useful for certain individuals who are unable to access some control units that need direct contact with humans. In the beginning, ten volunteers of a wide range of (20-66) participated in this study, and the statistical measurements were first calculated for all eight channels. Then the number of channels was reduced by half according to the activation of brain regions within the utilized protocol and the processing time also decreased. Consequently, four of the participants (three males and one female) were chosen to examine the Simulink model during di

View Publication Preview PDF

(1)

1 2 ... 59 60 61 62 ... 674 675