Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

3

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Sun Dec 01 2019

Journal Name

Baghdad Science Journal

Comparison of Some Suggested Estimators Based on Differencing Technique in the Partial Linear Model Using Simulation

DAUGRR

DGJR

DGRR

Differences technique

DMJGR

NW estimator

Saja Mohammad

...Show More Authors

In this paper new methods were presented based on technique of differences which is the difference- based modified jackknifed generalized ridge regression estimator(DMJGR) and difference-based generalized jackknifed ridge regression estimator(DGJR), in estimating the parameters of linear part of the partially linear model. As for the nonlinear part represented by the nonparametric function, it was estimated using Nadaraya Watson smoother. The partially linear model was compared using these proposed methods with other estimators based on differencing technique through the MSE comparison criterion in simulation study.

View Publication Preview PDF

(1)

Publication Date

Mon Jan 01 2024

Journal Name

Aip Conference Proceedings

Assessing road networks properties based on GIS techniques: Al-Karrada Region/Baghdad as a case study

Hala Jafar

Maythm

Afrah L.

...Show More Authors

Transportability refers to the ease with which people, goods, or services may be transferred. When transportability is high, distance becomes less of a limitation for activities. Transportation networks are frequently represented by a set of locations and a set of links that indicate the connections between those places which is usually called network topology. Hence, each transmission network has a unique topology that distinguishes its structure. The most essential components of such a framework are the network architecture and the connection level. This research aims to demonstrate the efficiency of the road network in the Al-Karrada area which is located in the Baghdad city. The analysis based on a quantitative evaluation using graph th

View Publication

Publication Date

Fri Jun 30 2023

Journal Name

Ingénierie Des Systèmes D Information

Performance Evaluation of a Multi Organizations Secure Internet of Vehicles Based on Hyperledger Fabric Blockchain Platform

Zahra

Ali H.

...Show More Authors

View Publication

(2)

Publication Date

Wed Sep 16 2020

Journal Name

International Journal Of Dentistry And Oral Science

Attitude and Knowledge of Orthodontics among General Dentists and Non-Orthodontic Specialists: A Questionnaire Based Survey

Orthodontics

Knowledge

Attitude

Dentists.

Muhanad

Nada

...Show More Authors

Aim: This study aimed to assessing orthodontic knowledge and attitude among general dentists and non-orthodontic specialists. Background: Early detection of orthodontic disorders is essentialin motivating patients to intervene prior to long term complications when the disorders are not recongised. Methods: A questionnaire was distributed amongst dentistsother than orthodontists. This questionnaire consisted of three sections. The first one aimed to collect demographic, educational level and practice type information. Further two sections consisted of closed-end questions designed to evaluateknowledge and attitude of orthodontics. Results: A total of 313 responses to the survey were submitted. No significant correlation was observed, e

View Publication

(1)

Publication Date

Mon Nov 14 2022

Journal Name

Physica Scripta

A wavelet-based collocation technique to find the discontinuous heat source in inverse heat conduction problems

Muhammad

Weidong

Masood

M S

Zaheer

...Show More Authors

Abstract<p>This paper is devoted to an inverse problem of determining discontinuous space-wise dependent heat source in a linear parabolic equation from the measurements at the final moment. In the existing literature, a considerably accurate solution to the inverse problems with an unknown space-wise dependent heat source is impossible without introducing any type of regularization method but here we have to determine the unknown discontinuous space-wise dependent heat source accurately using the Haar wavelet collocation method (HWCM) without applying the regularization technique. This HWCM is based on finite-difference and Haar wavelets approximation to the inverse problem. In contrast to othe</p> ... Show More

View Publication

(18)

(15)

Publication Date

Wed Feb 01 2023

Journal Name

Journal Of The World Federation Of Orthodontists

Validity and reliability of three-dimensional modeling of orthodontic dental casts using smartphone-based photogrammetric technology

Dhelal

Hadeel

Liu

...Show More Authors

View Publication

(8)

(5)

Publication Date

Thu Jan 31 2019

Journal Name

Journal Of Engineering

Enhancement in Lubricating, Rheological, and Filtration Properties of Unweighted Water-Based Mud Using XC Polymer NPs

unweighted water-based mud

nanomaterials

COF

rheological and filtration properties.

Massara

Nada S.

Asawer A.

...Show More Authors

In this research, an enhancement in lubricating, rheological, and filtration properties of unweighted water-based mud is fundamentally investigated using XC polymer NPs with 0.2gm, 0.5gm, 1gm, 2gm, and 4gm concentrations. Bentonite, that had been used in the preparation of unweighted water-based mud, was characterized using XRF-1800 Sequential X-ray Fluorescence Spectrometer, XRD-6100/7000 X-ray Diffractometer, and Malvern Mastersizer 2000 particle size analyzer, respectively. Lubricating, rheology and filtration properties of unweighted water-based mud were measured at room temperature (35°C) using OFITE EP and Lubricity Tester, OFITE Model 900 Viscometer, and OFITE Low-Pressure Filter Press, respectively. XC Polymer N

View Publication Preview PDF

(3)

Publication Date

Mon Jul 01 2019

Journal Name

Journal Of Engineering

Development of Bridges Maintenance Management System based on Geographic Information System Techniques (Case study: AlMuthanna \ Iraq)

geographic information system

bridges maintenance

geodatabase

satellite Image

geostatistical analysis.

Saif Abdul Ameer

Maythm

...Show More Authors

A Geographic Information System (GIS) is a computerized database management system for accumulating, storage, retrieval, analysis, and display spatial data. In general, GIS contains two broad categories of information, geo-referenced spatial data and attribute data. Geo-referenced spatial data define objects that have an orientation and relationship in two or three-dimensional space, while attribute data is qualitative data that can be counted for recording and analysis. The main aim of this research is to reveal the role of GIS technology in the enhancement of bridge maintenance management system components such as the output results, and make it more interpretable through dynamic colour coding and more sophisticated vi

View Publication Preview PDF

(3)

Publication Date

Sat Feb 01 2020

Journal Name

Journal Of Water Process Engineering

Immobilization of cobalt ions using hierarchically porous 4A zeolite-based carbon composites: Ion-exchange and solidification

Sama M.

Stuart M.

...Show More Authors

View Publication

(56)

(48)

Publication Date

Tue Sep 21 2021

Journal Name

Earth Resources And Environmental Remote Sensing/gis Applications Xii

Investigating the old city of Babylon: tracing buried structural history based on photogrammetry and integrated approaches

Israa

Fanar

...Show More Authors

View Publication

(5)

(1)

1 2 ... 112 113 114 115 ... 675 676