Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
Longitudinal data is becoming increasingly common, especially in the medical and economic fields, and various methods have been analyzed and developed to analyze this type of data.
In this research, the focus was on compiling and analyzing this data, as cluster analysis plays an important role in identifying and grouping co-expressed subfiles over time and employing them on the nonparametric smoothing cubic B-spline model, which is characterized by providing continuous first and second derivatives, resulting in a smoother curve with fewer abrupt changes in slope. It is also more flexible and can pick up on more complex patterns and fluctuations in the data.
The longitudinal balanced data profile was compiled into subgroup
... Show MoreThe issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the p
... Show MoreCryptography is the process of transforming message to avoid an unauthorized access of data. One of the main problems and an important part in cryptography with secret key algorithms is key. For higher level of secure communication key plays an important role. For increasing the level of security in any communication, both parties must have a copy of the secret key which, unfortunately, is not that easy to achieve. Triple Data Encryption Standard algorithm is weak due to its weak key generation, so that key must be reconfigured to make this algorithm more secure, effective, and strong. Encryption key enhances the Triple Data Encryption Standard algorithm securities. This paper proposed a combination of two efficient encryption algorithms to
... Show MoreAbstract: -
The concept of joint integration of important concepts in macroeconomic application, the idea of cointegration is due to the Granger (1981), and he explained it in detail in Granger and Engle in Econometrica (1987). The introduction of the joint analysis of integration in econometrics in the mid-eighties of the last century, is one of the most important developments in the experimental method for modeling, and the advantage is simply the account and use it only needs to familiarize them selves with ordinary least squares.
Cointegration seen relations equilibrium time series in the long run, even if it contained all the sequences on t
... Show MoreIt has increasingly been recognised that the future developments in geospatial data handling will centre on geospatial data on the web: Volunteered Geographic Information (VGI). The evaluation of VGI data quality, including positional and shape similarity, has become a recurrent subject in the scientific literature in the last ten years. The OpenStreetMap (OSM) project is the most popular one of the leading platforms of VGI datasets. It is an online geospatial database to produce and supply free editable geospatial datasets for a worldwide. The goal of this paper is to present a comprehensive overview of the quality assurance of OSM data. In addition, the credibility of open source geospatial data is discussed, highlighting the diff
... Show MoreThe issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator
... Show MoreCloud-based Electronic Health Records (EHRs) have seen a substantial increase in usage in recent years, especially for remote patient monitoring. Researchers are interested in investigating the use of Healthcare 4.0 in smart cities. This involves using Internet of Things (IoT) devices and cloud computing to remotely access medical processes. Healthcare 4.0 focuses on the systematic gathering, merging, transmission, sharing, and retention of medical information at regular intervals. Protecting the confidential and private information of patients presents several challenges in terms of thwarting illegal intrusion by hackers. Therefore, it is essential to prioritize the protection of patient medical data that is stored, accessed, and shared on
... Show MoreThis study was carried out at University of Baghdad - College of Agricultural Engineering Sciences - research station B during the fall season of 2019-2020, in order to evaluate the effect of Ozone enrichment and the foliar application of organic nutrient on nutrient and water use efficiency and fertilizer productivity of broccoli plant using the modified NFT film technology. A factorial experiment (2*5) was carried out within Nested Design with three replicates. The ozone treatment was distributed into the main plots which consisted of oxygen (O2) and ozone (O3). The foliar application of organic nutrients were distributed randomly within each replicate including five treatments, which were the control treatment (T0), Coconut wat
... Show MoreThis study was carried out at University of Baghdad - College of Agricultural Engineering Sciences - Research Station B during the autumn season 2019-2020, in order to evaluate the effect of Ozone and the foliar application of coconut water and moringa extract on the growth of broccoli plant grown in modified NFT film technology. A factorial experiment (2*5) was carried out within Nested Design with three replicates. The ozone treatment was distributed into the main plots which consisted of oxygen (O2) and ozone (O3). The foliar application of organic nutrients were distributed randomly within each replicate including five treatments, which were the control treatment (T0), Coconut water with two concentrations of 50 (T1) and 100 ml.
... Show More