Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
In this paper new methods were presented based on technique of differences which is the difference- based modified jackknifed generalized ridge regression estimator(DMJGR) and difference-based generalized jackknifed ridge regression estimator(DGJR), in estimating the parameters of linear part of the partially linear model. As for the nonlinear part represented by the nonparametric function, it was estimated using Nadaraya Watson smoother. The partially linear model was compared using these proposed methods with other estimators based on differencing technique through the MSE comparison criterion in simulation study.
Transportability refers to the ease with which people, goods, or services may be transferred. When transportability is high, distance becomes less of a limitation for activities. Transportation networks are frequently represented by a set of locations and a set of links that indicate the connections between those places which is usually called network topology. Hence, each transmission network has a unique topology that distinguishes its structure. The most essential components of such a framework are the network architecture and the connection level. This research aims to demonstrate the efficiency of the road network in the Al-Karrada area which is located in the Baghdad city. The analysis based on a quantitative evaluation using graph th
... Show MoreAim: This study aimed to assessing orthodontic knowledge and attitude among general dentists and non-orthodontic specialists. Background: Early detection of orthodontic disorders is essentialin motivating patients to intervene prior to long term complications when the disorders are not recongised. Methods: A questionnaire was distributed amongst dentistsother than orthodontists. This questionnaire consisted of three sections. The first one aimed to collect demographic, educational level and practice type information. Further two sections consisted of closed-end questions designed to evaluateknowledge and attitude of orthodontics. Results: A total of 313 responses to the survey were submitted. No significant correlation was observed, e
... Show MoreThis paper is devoted to an inverse problem of determining discontinuous space-wise dependent heat source in a linear parabolic equation from the measurements at the final moment. In the existing literature, a considerably accurate solution to the inverse problems with an unknown space-wise dependent heat source is impossible without introducing any type of regularization method but here we have to determine the unknown discontinuous space-wise dependent heat source accurately using the Haar wavelet collocation method (HWCM) without applying the regularization technique. This HWCM is based on finite-difference and Haar wavelets approximation to the inverse problem. In contrast to othe
In this research, an enhancement in lubricating, rheological, and filtration properties of unweighted water-based mud is fundamentally investigated using XC polymer NPs with 0.2gm, 0.5gm, 1gm, 2gm, and 4gm concentrations. Bentonite, that had been used in the preparation of unweighted water-based mud, was characterized using XRF-1800 Sequential X-ray Fluorescence Spectrometer, XRD-6100/7000 X-ray Diffractometer, and Malvern Mastersizer 2000 particle size analyzer, respectively. Lubricating, rheology and filtration properties of unweighted water-based mud were measured at room temperature (35°C) using OFITE EP and Lubricity Tester, OFITE Model 900 Viscometer, and OFITE Low-Pressure Filter Press, respectively. XC Polymer N
... Show MoreA Geographic Information System (GIS) is a computerized database management system for accumulating, storage, retrieval, analysis, and display spatial data. In general, GIS contains two broad categories of information, geo-referenced spatial data and attribute data. Geo-referenced spatial data define objects that have an orientation and relationship in two or three-dimensional space, while attribute data is qualitative data that can be counted for recording and analysis. The main aim of this research is to reveal the role of GIS technology in the enhancement of bridge maintenance management system components such as the output results, and make it more interpretable through dynamic colour coding and more sophisticated vi
... Show More