Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
This study assessed the advantage of using earthworms in combination with punch waste and nutrients in remediating drill cuttings contaminated with hydrocarbons. Analyses were performed on day 0, 7, 14, 21, and 28 of the experiment. Two hydrocarbon concentrations were used (20000 mg/kg and 40000 mg/kg) for three groups of earthworms number which were five, ten and twenty earthworms. After 28 days, the total petroleum hydrocarbon (TPH) concentration (20000 mg/kg) was reduced to 13200 mg/kg, 9800 mg/kg, and 6300 mg/kg in treatments with five, ten and twenty earthworms respectively. Also, TPH concentration (40000 mg/kg) was reduced to 22000 mg/kg, 10100 mg/kg, and 4200 mg/kg in treatments with the above number of earthworms respectively. The p
... Show MoreImage retrieval is used in searching for images from images database. In this paper, content – based image retrieval (CBIR) using four feature extraction techniques has been achieved. The four techniques are colored histogram features technique, properties features technique, gray level co- occurrence matrix (GLCM) statistical features technique and hybrid technique. The features are extracted from the data base images and query (test) images in order to find the similarity measure. The similarity-based matching is very important in CBIR, so, three types of similarity measure are used, normalized Mahalanobis distance, Euclidean distance and Manhattan distance. A comparison between them has been implemented. From the results, it is conclud
... Show MoreSpeech is the essential way to interact between humans or between human and machine. However, it is always contaminated with different types of environment noise. Therefore, speech enhancement algorithms (SEA) have appeared as a significant approach in speech processing filed to suppress background noise and return back the original speech signal. In this paper, a new efficient two-stage SEA with low distortion is proposed based on minimum mean square error sense. The estimation of clean signal is performed by taking the advantages of Laplacian speech and noise modeling based on orthogonal transform (Discrete Krawtchouk-Tchebichef transform) coefficients distribution. The Discrete Kra
Plagiarism is becoming more of a problem in academics. It’s made worse by the ease with which a wide range of resources can be found on the internet, as well as the ease with which they can be copied and pasted. It is academic theft since the perpetrator has ”taken” and presented the work of others as his or her own. Manual detection of plagiarism by a human being is difficult, imprecise, and time-consuming because it is difficult for anyone to compare their work to current data. Plagiarism is a big problem in higher education, and it can happen on any topic. Plagiarism detection has been studied in many scientific articles, and methods for recognition have been created utilizing the Plagiarism analysis, Authorship identification, and
... Show MoreA novel median filter based on crow optimization algorithms (OMF) is suggested to reduce the random salt and pepper noise and improve the quality of the RGB-colored and gray images. The fundamental idea of the approach is that first, the crow optimization algorithm detects noise pixels, and that replacing them with an optimum median value depending on a criterion of maximization fitness function. Finally, the standard measure peak signal-to-noise ratio (PSNR), Structural Similarity, absolute square error and mean square error have been used to test the performance of suggested filters (original and improved median filter) used to removed noise from images. It achieves the simulation based on MATLAB R2019b and the resul
... Show MoreThis paper deals with the design and implementation of an ECG system. The proposed system gives a new concept of ECG signal manipulation, storing, and editing. It consists mainly of hardware circuits and the related software. The hardware includes the circuits of ECG signals capturing, and system interfaces. The software is written using Visual Basic languages, to perform the task of identification of the ECG signal. The main advantage of the system is to provide a reported ECG recording on a personal computer, so that it can be stored and processed at any time as required. This system was tested for different ECG signals, some of them are abnormal and the other is normal, and the results show that the system has a good quality of diagno
... Show MoreIn this paper, we implement and examine a Simulink model with electroencephalography (EEG) to control many actuators based on brain waves. This will be in great demand since it will be useful for certain individuals who are unable to access some control units that need direct contact with humans. In the beginning, ten volunteers of a wide range of (20-66) participated in this study, and the statistical measurements were first calculated for all eight channels. Then the number of channels was reduced by half according to the activation of brain regions within the utilized protocol and the processing time also decreased. Consequently, four of the participants (three males and one female) were chosen to examine the Simulink model during di
... Show More