Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
Abstract
This research aims to analyze the reality of the production process in an assembly line Cars (RUNNA) in the public company for the automotive industry / Alexandria through the use of some Lean production tools, and data were collected through permanence in the company to identify the problems of the line in order to find appropriate to adopt some Lean production tools solutions, and results showed the presence of Lead time in some stations, which is reflected on the customer's waiting time to get the car, as well as some of the problems existing in the car produced such as high temperature of the car, as the company does not take into account customer preferences,
... Show MoreIn this paper, image compression technique is presented based on the Zonal transform method. The DCT, Walsh, and Hadamard transform techniques are also implements. These different transforms are applied on SAR images using Different block size. The effects of implementing these different transforms are investigated. The main shortcoming associated with this radar imagery system is the presence of the speckle noise, which affected the compression results.
The research problem is clearly deficient suffered by the internal audit function in all institutions of Iraq, as a result of the lack of sponsor organizations for this profession and there is no law or local legislation determine its powers and its responsibilities and scope of work As well as the lack of interest of senior management in economic units that function, as it focuses its work on the scope of financial and accounting matters only So required to rebuild this function in line with the current developments as well as the lack of a framework that defines the strategy of this function, and it came the idea of research to find out how to create a regulatory method for re-strategic construction of the internal audit function depen
... Show MoreThe most significant function in oil exploration is determining the reservoir facies, which are based mostly on the primary features of rocks. Porosity, water saturation, and shale volume as well as sonic log and Bulk density are the types of input data utilized in Interactive Petrophysics software to compute rock facies. These data are used to create 15 clusters and four groups of rock facies. Furthermore, the accurate matching between core and well-log data is established by the neural network technique. In the current study, to evaluate the applicability of the cluster analysis approach, the result of rock facies from 29 wells derived from cluster analysis were utilized to redistribute the petrophysical properties for six units of Mishri
... Show MoreMany stone tools were found on a hill south of the Hor Al-Dalmaj which is located in the central part of the alluvial plain of Mesopotamia, between the Tigris and Euphrates Rivers. The types of rocks from which the studied stone tools were made are not found in the alluvial plain, because it consists of friable sand, silt, and clay. All existing sediments were precipitated in riverine environments such as point bar, over bank, and floodplain sediments. The collected stone tools were described with a magnifying glass (10 x) and a polarized microscope after they were thin sectioned. Microscopic analysis showed that these stone tools are made of sedimentary, volcanic igneous and metamorphic rocks, such as: sandstones, limestones, chert, con
... Show MoreIn the present survey 18 species of endo and ecto-parasites were recorded during the examination of 50 Mus musculus (Linnaeus, 1758) among 10 localities in Erbil city, of which 7 species were protozoan and as follows : Chilomastix bettencourti (da Fonseca 1915)82%; Giardia muris (Filice, 1952) 68%; Tritrichomonas muris (Grassi,1879)36%; Entamoeba histolytica (Schaudinn,1903) 24%; Entamoeba coli (Grassi,1879)32%; Eimeria sp. 28% and Trypanosoma musculi (Kendall,1906)2%; and 8 species were helminthes as follows: 4 Cestodes: Rodentolepis nana (von Siebold, 1852) 8%; Hymenolepis diminuta (Rudolphi, 1819)2%; larval stage of Echinococcus granulosus (Batsch, 1786)8%, Cysticercus fasciolaris (Rudolphi, 1808)6%, 4 Nematodes: Aspiculuris tetrapter
... Show MoreCassava, a significant crop in Africa, Asia, and South America, is a staple food for millions. However, classifying cassava species using conventional color, texture, and shape features is inefficient, as cassava leaves exhibit similarities across different types, including toxic and non-toxic varieties. This research aims to overcome the limitations of traditional classification methods by employing deep learning techniques with pre-trained AlexNet as the feature extractor to accurately classify four types of cassava: Gajah, Manggu, Kapok, and Beracun. The dataset was collected from local farms in Lamongan Indonesia. To collect images with agricultural research experts, the dataset consists of 1,400 images, and each type of cassava has
... Show MoreIn this research, the nonparametric technique has been presented to estimate the time-varying coefficients functions for the longitudinal balanced data that characterized by observations obtained through (n) from the independent subjects, each one of them is measured repeatedly by group of specific time points (m). Although the measurements are independent among the different subjects; they are mostly connected within each subject and the applied techniques is the Local Linear kernel LLPK technique. To avoid the problems of dimensionality, and thick computation, the two-steps method has been used to estimate the coefficients functions by using the two former technique. Since, the two-
... Show More