The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, which is used to cluster genes. FCM allows an object to belong to two or more clusters with a membership grade between zero and one and the sum of belonging to all clusters of each gene is equal to one. This paradigm is useful when dealing with microarray data. The total time required to implement the first model is 22.2589 s. The second model combines FCM and particle swarm optimization (PSO) to obtain better results. The hybrid algorithm, i.e., FCM–PSO, uses the DB index as objective function. The experimental results show that the proposed hybrid FCM–PSO method is effective. The total time of implementation of this model is 89.6087 s. The third model combines FCM with a genetic algorithm (GA) to obtain better results. This hybrid algorithm also uses the DB index as objective function. The experimental results show that the proposed hybrid FCM–GA method is effective. Its total time of implementation is 50.8021 s. In addition, this study uses cluster validity indexes to determine the best partitioning for the underlying data. Internal validity indexes include the Jaccard, Davies Bouldin, Dunn, Xie–Beni, and silhouette. Meanwhile, external validity indexes include Minkowski, adjusted Rand, and percentage of correctly categorized pairings. Experiments conducted on brain tumor gene expression data demonstrate that the techniques used in this study outperform traditional models in terms of stability and biological significance.
The research discusses the need to find the innovative structures and methodologies for developing Human Capital (HC) in Iraqi Universities. One of the most important of these structures is Communities of Practice (CoPs) which contributes to develop HC by using learning, teaching and training through the conversion speed of knowledge and creativity into practice. This research has been used the comparative approach through employing the methodology of Data Envelopment Analysis (DEA) by using (Excel 2010 - Solver) as a field evidence to prove the role of CoPs in developing HC. In light of the given information, a researcher adopted on an archived preliminary data about (23) colleges at Mosul University as a deliberate sample for t
... Show MoreThis paper aims at the analytical level to know the security topics that were used with data journalism, and the expression methods used in the statements of the Security Media Cell, as well as to identify the means of clarification used in data journalism. About the Security Media Cell, and the methods preferred by the public in presenting press releases, especially determining the strength of the respondents' attitude towards the data issued by the Security Media Cell. On the Security Media Cell, while the field study included the distribution of a questionnaire to the public of Baghdad Governorate. The study reached several results, the most important of which is the interest of the security media cell in presenting its data in differ
... Show MoreThis study aimed to investigate the role of Big Data in forecasting corporate bankruptcy and that is through a field analysis in the Saudi business environment, to test that relationship. The study found: that Big Data is a recently used variable in the business context and has multiple accounting effects and benefits. Among the benefits is forecasting and disclosing corporate financial failures and bankruptcies, which is based on three main elements for reporting and disclosing that, these elements are the firms’ internal control system, the external auditing, and financial analysts' forecasts. The study recommends: Since the greatest risk of Big Data is the slow adaptation of accountants and auditors to these technologies, wh
... Show MoreIn the last two decades, arid and semi-arid regions of China suffered rapid changes in the Land Use/Cover Change (LUCC) due to increasing demand on food, resulting from growing population. In the process of this study, we established the land use/cover classification in addition to remote sensing characteristics. This was done by analysis of the dynamics of (LUCC) in Zhengzhou area for the period 1988-2006. Interpretation of a laminar extraction technique was implied in the identification of typical attributes of land use/cover types. A prominent result of the study indicates a gradual development in urbanization giving a gradual reduction in crop field area, due to the progressive economy in Zhengzhou. The results also reflect degradati
... Show MoreThe hydrological process has a dynamic nature characterised by randomness and complex phenomena. The application of machine learning (ML) models in forecasting river flow has grown rapidly. This is owing to their capacity to simulate the complex phenomena associated with hydrological and environmental processes. Four different ML models were developed for river flow forecasting located in semiarid region, Iraq. The effectiveness of data division influence on the ML models process was investigated. Three data division modeling scenarios were inspected including 70%–30%, 80%–20, and 90%–10%. Several statistical indicators are computed to verify the performance of the models. The results revealed the potential of the hybridized s
... Show MoreThe Estimation Of The Reliability Function Depends On The Accuracy Of The Data Used To Estimate The Parameters Of The Probability distribution, and Because Some Data Suffer from a Skew in their Data to Estimate the Parameters and Calculate the Reliability Function in light of the Presence of Some Skew in the Data, there must be a Distribution that has flexibility in dealing with that Data. As in the data of Diyala Company for Electrical Industries, as it was observed that there was a positive twisting in the data collected from the Power and Machinery Department, which required distribution that deals with those data and searches for methods that accommodate this problem and lead to accurate estimates of the reliability function,
... Show MoreThe idea of carrying out research on incomplete data came from the circumstances of our dear country and the horrors of war, which resulted in the missing of many important data and in all aspects of economic, natural, health, scientific life, etc.,. The reasons for the missing are different, including what is outside the will of the concerned or be the will of the concerned, which is planned for that because of the cost or risk or because of the lack of possibilities for inspection. The missing data in this study were processed using Principal Component Analysis and self-organizing map methods using simulation. The variables of child health and variables affecting children's health were taken into account: breastfeed
... Show MoreThis research aims to study the methods of reduction of dimensions that overcome the problem curse of dimensionality when traditional methods fail to provide a good estimation of the parameters So this problem must be dealt with directly . Two methods were used to solve the problem of high dimensional data, The first method is the non-classical method Slice inverse regression ( SIR ) method and the proposed weight standard Sir (WSIR) method and principal components (PCA) which is the general method used in reducing dimensions, (SIR ) and (PCA) is based on the work of linear combinations of a subset of the original explanatory variables, which may suffer from the problem of heterogeneity and the problem of linear
... Show MoreThis research sought to present a concept of cross-sectional data models, A crucial double data to take the impact of the change in time and obtained from the measured phenomenon of repeated observations in different time periods, Where the models of the panel data were defined by different types of fixed , random and mixed, and Comparing them by studying and analyzing the mathematical relationship between the influence of time with a set of basic variables Which are the main axes on which the research is based and is represented by the monthly revenue of the working individual and the profits it generates, which represents the variable response And its relationship to a set of explanatory variables represented by the
... Show MoreThyroid disease is a common disease affecting millions worldwide. Early diagnosis and treatment of thyroid disease can help prevent more serious complications and improve long-term health outcomes. However, thyroid disease diagnosis can be challenging due to its variable symptoms and limited diagnostic tests. By processing enormous amounts of data and seeing trends that may not be immediately evident to human doctors, Machine Learning (ML) algorithms may be capable of increasing the accuracy with which thyroid disease is diagnosed. This study seeks to discover the most recent ML-based and data-driven developments and strategies for diagnosing thyroid disease while considering the challenges associated with imbalanced data in thyroid dise
... Show More