Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Fri Feb 28 2025

Journal Name

Journal Européen Des Systèmes Automatisés

Decision-Making Model for Aircraft Landing Based on Fuzzy Logic Approach

aircraft landing

fuzzy logic modeling

runway condition

visibility

wind speed and direction

Afrah A.

Ahmed Hameed

Nizar H.

...Show More Authors

An aircraft's landing stage involves inherent hazards and problems associated with many factors, such as weather, runway conditions, pilot experiences, etc. The pilot is responsible for selecting the proper landing procedure based on information provided by the landing console operator (LCO). Given the likelihood of human decisions due to errors and biases, creating an intelligent system becomes important to predict accurate decisions. This paper proposes the fuzzy logic method, which intends to handle the uncertainty and ambiguity inherent in the landing phase, providing intelligent decision support to the pilot while reducing the workload of the LCO. The fuzzy system, built using the Mamdani approach in MATLAB software, considers critical

View Publication

(2)

Publication Date

Sun Sep 01 2019

Journal Name

2019 11th Computer Science And Electronic Engineering (ceec)

ANN based Measurement for No-Reference Video Quality of Experience Metric

Amal Sufiuh

Rana Fareed

Laith

...Show More Authors

View Publication

(3)

(1)

Publication Date

Tue Dec 05 2023

Journal Name

Baghdad Science Journal

Indoor/Outdoor Deep Learning Based Image Classification for Object Recognition Applications

Deep learning

GoogleNet

Image classification

Indoor/outdoor

Transfer learning.

Omar Abdullatif

Mohammed Jawad

Zenah Hadi

...Show More Authors

With the rapid development of smart devices, people's lives have become easier, especially for visually disabled or special-needs people. The new achievements in the fields of machine learning and deep learning let people identify and recognise the surrounding environment. In this study, the efficiency and high performance of deep learning architecture are used to build an image classification system in both indoor and outdoor environments. The proposed methodology starts with collecting two datasets (indoor and outdoor) from different separate datasets. In the second step, the collected dataset is split into training, validation, and test sets. The pre-trained GoogleNet and MobileNet-V2 models are trained using the indoor and outdoor se

View Publication Preview PDF

(6)

Publication Date

Tue Jan 01 2019

Journal Name

International Journal Of Advanced Computer Science And Applications

Achieving Flatness: Honeywords Generation Method for Passwords based on user behaviours

Omar Z

Ann

G.

H.

...Show More Authors

View Publication

(3)

Publication Date

Tue Dec 05 2023

Journal Name

Baghdad Science Journal

AlexNet-Based Feature Extraction for Cassava Classification: A Machine Learning Approach

Color

Feature extraction

KNN

Naïve Bayes

Shape

SVM

Texture

Miftahus

Mohd Farhan Md

Mohd Norasri

...Show More Authors

Cassava, a significant crop in Africa, Asia, and South America, is a staple food for millions. However, classifying cassava species using conventional color, texture, and shape features is inefficient, as cassava leaves exhibit similarities across different types, including toxic and non-toxic varieties. This research aims to overcome the limitations of traditional classification methods by employing deep learning techniques with pre-trained AlexNet as the feature extractor to accurately classify four types of cassava: Gajah, Manggu, Kapok, and Beracun. The dataset was collected from local farms in Lamongan Indonesia. To collect images with agricultural research experts, the dataset consists of 1,400 images, and each type of cassava has

View Publication Preview PDF

(10)

(5)

Publication Date

Mon Jul 15 2019

Journal Name

Iet Microwaves, Antennas & Propagation

Hilbert metamaterial printed antenna based on organic substrates for energy harvesting

energy harvestin

metamaterial

Taha A.

Zaid Asaad

Omar Almukhtar

...Show More Authors

Abstract In this study, an investigation is conducted to realise the possibility of organic materials use in radio frequency (RF) electronics for RF-energy harvesting. Iraqi palm tree remnants mixed with nickel oxide nanoparticles hosted in polyethylene, INP substrates, is proposed for this study. Moreover, a metamaterial (MTM) antenna is printed on the created INP substrate of 0.8 mm thickness using silver nanoparticles conductive ink. The fabricated antenna performances are instigated numerically than validated experimentally in terms of S11 spectra and radiation patterns. It is found that the proposed antenna shows an ultra-wide band matching bandwidth to cover the frequencies from 2.4 to 10 GHz with bore-sight gain variation from 2.2 to

View Publication

(55)

(34)

Publication Date

Sun Aug 06 2023

Journal Name

Journal Of Economics And Administrative Sciences

Probit and Improved Probit Transform-Based Kernel Estimator for Copula Density

Copula function

Probit transformation

Kernel copula function

Improved probit transformation

Mirror reflection

Boundary bias

Fatimah Hashim

Munaf Yousif

...Show More Authors

Copula modeling is widely used in modern statistics. The boundary bias problem is one of the problems faced when estimating by nonparametric methods, as kernel estimators are the most common in nonparametric estimation. In this paper, the copula density function was estimated using the probit transformation nonparametric method in order to get rid of the boundary bias problem that the kernel estimators suffer from. Using simulation for three nonparametric methods to estimate the copula density function and we proposed a new method that is better than the rest of the methods by five types of copulas with different sample sizes and different levels of correlation between the copula variables and the different parameters for the function. The

Publication Date

Sun Jul 01 2018

Journal Name

2018 2nd International Conference On Imaging, Signal Processing And Communication (icispc)

Analogy-based Common-Sense Knowledge for Opinion-Target Identification and Aggregation

Feature extraction

Task analysis

Semantics

Sentiment analysis

Aggregates

Encyclopedias

Omar Mustafa

Nurul Hashimah Ahamed Hassain

Yu-N

...Show More Authors

The development of Web 2.0 has improved people's ability to share their opinions. These opinions serve as an important piece of knowledge for other reviewers. To figure out what the opinions is all about, an automatic system of analysis is needed. Aspect-based sentiment analysis is the most important research topic conducted to extract reviewers-opinions about certain attribute, for instance opinion-target (aspect). In aspect-based tasks, the identification of the implicit aspect such as aspects implicitly implied in a review, is the most challenging task to accomplish. However, this paper strives to identify the implicit aspects based on hierarchical algorithm incorporated with common-sense knowledge by means of dimensionality reduction.

View Publication Preview PDF

(3)

(2)

Publication Date

Mon Aug 01 2022

Journal Name

Baghdad Science Journal

Perceptually Important Points-Based Data Aggregation Method for Wireless Sensor Networks

Data Aggregation

Energy-Saving

Perceptually Important Points (PIP)

Wireless Sensor Network.

Iman Dakhil Idan

Ali Kadhum M.

...Show More Authors

The transmitting and receiving of data consume the most resources in Wireless Sensor Networks (WSNs). The energy supplied by the battery is the most important resource impacting WSN's lifespan in the sensor node. Therefore, because sensor nodes run from their limited battery, energy-saving is necessary. Data aggregation can be defined as a procedure applied for the elimination of redundant transmissions, and it provides fused information to the base stations, which in turn improves the energy effectiveness and increases the lifespan of energy-constrained WSNs. In this paper, a Perceptually Important Points Based Data Aggregation (PIP-DA) method for Wireless Sensor Networks is suggested to reduce redundant data before sending them to the

View Publication Preview PDF

(63)

(53)

Publication Date

Fri Apr 28 2023

Journal Name

Surgical Neurology International

Neurosurgery theater-based learning: Etiquette and preparation tips for medical students

Mustafa

Jaafar

Mahmood F.

Aktham O.

Sama S.

Alkawthar M.

Hayder R.

Samer S.

...Show More Authors

View Publication Preview PDF

(1)

1 2 ... 24 25 26 27 ... 720 721