Big data analysis has important applications in many areas such as sensor networks and connected healthcare. High volume and velocity of big data bring many challenges to data analysis. One possible solution is to summarize the data and provides a manageable data structure to hold a scalable summarization of data for efficient and effective analysis. This research extends our previous work on developing an effective technique to create, organize, access, and maintain summarization of big data and develops algorithms for Bayes classification and entropy discretization of large data sets using the multi-resolution data summarization structure. Bayes classification and data discretization play essential roles in many learning algorithms such as decision tree and nearest neighbor search. The proposed method can handle streaming data efficiently and, for entropy discretization, provide su the optimal split value.
Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Border
... Show MoreComputer-aided diagnosis (CAD) has proved to be an effective and accurate method for diagnostic prediction over the years. This article focuses on the development of an automated CAD system with the intent to perform diagnosis as accurately as possible. Deep learning methods have been able to produce impressive results on medical image datasets. This study employs deep learning methods in conjunction with meta-heuristic algorithms and supervised machine-learning algorithms to perform an accurate diagnosis. Pre-trained convolutional neural networks (CNNs) or auto-encoder are used for feature extraction, whereas feature selection is performed using an ant colony optimization (ACO) algorithm. Ant colony optimization helps to search for the bes
... Show MoreAn oil spill is a leakage of pipelines, vessels, oil rigs, or tankers that leads to the release of petroleum products into the marine environment or on land that happened naturally or due to human action, which resulted in severe damages and financial loss. Satellite imagery is one of the powerful tools currently utilized for capturing and getting vital information from the Earth's surface. But the complexity and the vast amount of data make it challenging and time-consuming for humans to process. However, with the advancement of deep learning techniques, the processes are now computerized for finding vital information using real-time satellite images. This paper applied three deep-learning algorithms for satellite image classification
... Show MorePotential data interpretation is significant for subsurface structure characterization. The current study is an attempt to explore the magnetic low lying between Najaf and Diwaniyah Cities, In central Iraq. It aims to understand the subsurface structures that may result from this anomaly and submit a better subsurface structural image of the region. The study area is situated in the transition zone, known as the Abu Jir Fault Zone. This tectonic boundary is an inherited basement weak zone extending towards the NW-SE direction. Gravity and magnetic data processing and enhancement techniques; Total Horizontal Gradient, Tilt Angle, Fast Sigmoid Edge Detection, Improved Logistic, and Theta Map filters highlight source boundaries and the
... Show MoreThe study aimed to reach the best rating for the views and variables in the totals characterized by qualities and characteristics common within each group and distinguish them from aggregates other for the purpose of distinguishing between Iraqi provinces which suffer from deprivation, for the purpose of identifying the status of those provinces in the early allowing interested parties and regulators to intervene to take appropriate corrective action in a timely manner. Style has been used cluster analysis Cluster analysis to reach the best rating to those totals from the provinces that suffer from problems, where the provinces were classified, based on the variables (Edu
... Show MoreThe non static chain is always the problem of static analysis so that explained some of theoretical work, the properties of statistical regression analysis to lose when using strings in statistic and gives the slope of an imaginary relation under consideration. chain is not static can become static by adding variable time to the multivariate analysis the factors to remove the general trend as well as variable placebo seasons to remove the effect of seasonal .convert the data to form exponential or logarithmic , in addition to using the difference repeated d is said in this case it integrated class d. Where the research contained in the theoretical side in parts in the first part the research methodology ha
... Show MoreBusiness organizations have faced many challenges in recent times, most important of which is information technology, because it is widely spread and easy to use. Its use has led to an increase in the amount of data that business organizations deal with an unprecedented manner. The amount of data available through the internet is a problem that many parties seek to find solutions for. Why is it available there in this huge amount randomly? Many expectations have revealed that in 2017, there will be devices connected to the internet estimated at three times the population of the Earth, and in 2015 more than one and a half billion gigabytes of data was transferred every minute globally. Thus, the so-called data mining emerged as a
... Show MorePermeability data has major importance work that should be handled in all reservoir simulation studies. The importance of permeability data increases in mature oil and gas fields due to its sensitivity for the requirements of some specific improved recoveries. However, the industry has a huge source of data of air permeability measurements against little number of liquid permeability values. This is due to the relatively high cost of special core analysis.
The current study suggests a correlation to convert air permeability data that are conventionally measured during laboratory core analysis into liquid permeability. This correlation introduces a feasible estimation in cases of data loose and poorly consolidated formations, or in cas