Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

3

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Tue Mar 01 2016

Journal Name

Al-academy

Semantic Analogy of Quran Text Content Found in Ornamental Figures in Islamic Architecture: وسام كامل عبد الامير

Wissam

...Show More Authors

This study aims at creating an analogy between Quran text Contents as meanings that have representations as visional shapes within ornamental figures in Islamic architecture. The theoretical framework of the study deals with the concept of semantics and its parts, artistic contents of Quran texts, and ornamental figures in Islamic architecture. The study procedures included a population of (69) figures, (5) of them were chosen deliberately for analysis in accordance with a form that had been presented to a number of experts to ensure its validity. The study reached a number of conclusions, the most significant among them are: adopting natural denotation of direct reference in order to link the ornamental figure to the source it was taken

View Publication Preview PDF

Publication Date

Thu Mar 01 2018

Journal Name

Nauchforum

THE PROBLEM OF TRANSLATING METAPHOR IN AN ARTISTIC TEXT (ON THE MATERIAL OF RUSSIAN AND ARABIC LANGUAGES)

The notion of metaphor

the types of metaphor in Russian

the types of metaphor in Arabic

Yusra Akram

...Show More Authors

THE PROBLEM OF TRANSLATING METAPHOR IN AN ARTISTIC TEXT (ON THE MATERIAL OF RUSSIAN AND ARABIC LANGUAGES)

Publication Date

Sat Jul 01 2023

Journal Name

Electric Power Systems Research

Analytical and measurement-based wideband two-port modeling of DC-DC converters for electromagnetic transient studies

Black-box models

DC-DC converters

Frequency domain analysis

Numerical Laplace transform

Two-port models

Wideband representation

H.

P.

...Show More Authors

Power-electronic converters are essential elements for the effective interconnection of renewable energy sources to the power grid, as well as to include energy storage units, vehicle charging stations, microgrids, etc. Converter models that provide an accurate representation of their wideband operation and interconnection with other active and passive grid components and systems are necessary for reliable steady state and transient analyses during normal or abnormal grid operating conditions. This paper introduces two Laplace domain-based approaches to model buck and boost DC-DC converters for electromagnetic transient studies. The first approach is an analytical one, where the converter is represented by a two-port admittance model via mo

View Publication

(2)

(1)

Publication Date

Fri Jan 01 2021

Journal Name

Prepodavatel Xxi Vek

Enrichment of Vocabulary and the Formation of Grammatically Correct Speech of Foreign Students When Studying the Text of a Literary Work

enrichment of vocabulary

formation of grammatically correct speech of a foreign student

Russian literature

literary text

foreign audience.

Azzam Ahmed

...Show More Authors

View Publication Preview PDF

Publication Date

Sun Mar 26 2017

Journal Name

Iraqi Journal Of Pharmaceutical Sciences ( P-issn 1683 - 3597 E-issn 2521 - 3512)

Potentiometric Transducers for the Selective Recognition of Risperidone Based on Molecularly Imprinted Polymer

Najwa

Hamsa

...Show More Authors

Graphite Coated Electrodes (GCE) based on molecularly imprinted polymers were fabricated for the selective potentiometric determination of Risperidone (Ris). The molecularly imprinted (MIP) and nonimprinted (NIP) polymers were synthesized by bulk polymerization using (Ris.) as a template, acrylic acid (AA) and acrylamide (AAm) as monomers, ethylene glycol dimethacrylate (EGDMA) as a cross-linker and benzoyl peroxide (BPO) as an initiator. The imprinted membranes and the non-imprinted membranes were prepared using dioctyl phthalate (DOP) and Dibutylphthalate (DBP) as plasticizers in PVC matrix. The membranes were coated on graphite electrodes. The MIP electrodes using

View Publication Preview PDF

(1)

Publication Date

Sun Jun 01 2014

Journal Name

Baghdad Science Journal

Survival estimation for singly type one censored sample based on generalized Rayleigh distribution

Maximum likelihood method

type one censored sample

interval estimation method

and fisher information matrix .

Iden H.

Hind J.

...Show More Authors

This paper interest to estimation the unknown parameters for generalized Rayleigh distribution model based on censored samples of singly type one . In this paper the probability density function for generalized Rayleigh is defined with its properties . The maximum likelihood estimator method is used to derive the point estimation for all unknown parameters based on iterative method , as Newton – Raphson method , then derive confidence interval estimation which based on Fisher information matrix . Finally , testing whether the current model ( GRD ) fits to a set of real data , then compute the survival function and hazard function for this real data.

View Publication Preview PDF

Publication Date

Thu Jun 01 2023

Journal Name

Bulletin Of Electrical Engineering And Informatics

A missing data imputation method based on salp swarm algorithm for diabetes disease

Geehan Sabah Hassan

Noora Jamal Ali

Asma Khazaal Abdulsahib

Farah Jasim Mohammed

...Show More Authors

Most of the medical datasets suffer from missing data, due to the expense of some tests or human faults while recording these tests. This issue affects the performance of the machine learning models because the values of some features will be missing. Therefore, there is a need for a specific type of methods for imputing these missing data. In this research, the salp swarm algorithm (SSA) is used for generating and imputing the missing values in the pain in my ass (also known Pima) Indian diabetes disease (PIDD) dataset, the proposed algorithm is called (ISSA). The obtained results showed that the classification performance of three different classifiers which are support vector machine (SVM), K-nearest neighbour (KNN), and Naïve B

View Publication

(5)

(1)

Publication Date

Mon Oct 03 2022

Journal Name

International Journal Of Interactive Mobile Technologies (ijim)

A New Feature-Based Method for Similarity Measurement under the Linux Operating System

Almarsoomi F.A.

...Show More Authors

This paper presents a new algorithm in an important research field which is the semantic word similarity estimation. A new feature-based algorithm is proposed for measuring the word semantic similarity for the Arabic language. It is a highly systematic language where its words exhibit elegant and rigorous logic. The score of sematic similarity between two Arabic words is calculated as a function of their common and total taxonomical features. An Arabic knowledge source is employed for extracting the taxonomical features as a set of all concepts that subsumed the concepts containing the compared words. The previously developed Arabic word benchmark datasets are used for optimizing and evaluating the proposed algorithm. In this paper,

View Publication

Publication Date

Sat May 01 2021

Journal Name

Journal Of Physics: Conference Series

Discrete wavelet based estimator for the Hurst parameter of multivariate fractional Brownian motion

Munaf Yousif

...Show More Authors

Abstract<p>In this paper, wavelets were used to study the multivariate fractional Brownian motion through the deviations of the random process to find an efficient estimation of Hurst exponent. The results of simulations experiments were shown that the performance of the proposed estimator was efficient. The estimation process was made by taking advantage of the detail coefficients stationarity from the wavelet transform, as the variance of this coefficient showed the power-low behavior. We use two wavelet filters (Haar and db5) to manage minimizing the mean square error of the model.</p>

View Publication

(3)

(2)

Publication Date

Thu Dec 01 2022

Journal Name

Journal Of Engineering

Deep Learning-Based Segmentation and Classification Techniques for Brain Tumor MRI: A Review

Brain Tumor

Magnetic Resonance Imaging (MRI)

Convolutional Neural Network (CNN)

Classification

Segmentation

Feature Extraction.

Noor Mohammed

Nassir H.

...Show More Authors

Early detection of brain tumors is critical for enhancing treatment options and extending patient survival. Magnetic resonance imaging (MRI) scanning gives more detailed information, such as greater contrast and clarity than any other scanning method. Manually dividing brain tumors from many MRI images collected in clinical practice for cancer diagnosis is a tough and time-consuming task. Tumors and MRI scans of the brain can be discovered using algorithms and machine learning technologies, making the process easier for doctors because MRI images can appear healthy when the person may have a tumor or be malignant. Recently, deep learning techniques based on deep convolutional neural networks have been used to analyze med

View Publication Preview PDF

(9)

1 2 ... 28 29 30 31 ... 673 674