Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
Abstract
The research Compared two methods for estimating fourparametersof the compound exponential Weibull - Poisson distribution which are the maximum likelihood method and the Downhill Simplex algorithm. Depending on two data cases, the first one assumed the original data (Non-polluting), while the second one assumeddata contamination. Simulation experimentswere conducted for different sample sizes and initial values of parameters and under different levels of contamination. Downhill Simplex algorithm was found to be the best method for in the estimation of the parameters, the probability function and the reliability function of the compound distribution in cases of natural and contaminateddata.
... Show More
One of the most significant elements influencing weather, climate, and the environment is vegetation cover. Normalized Difference Vegetation Index (NDVI) and Normalized Difference Built-up Index (NDBI) over the years 2019–2022 were estimated based on four Landsat 8 TIRS’s images covering Duhok City. Using the radiative transfer model, the city's land surface temperature (LST) during the next four years was calculated. The aim of this study is to compute the temperature at the land's surface (LST) from the years 2019-2022 and understand the link, between LST, NDVI, and NDBI and the capability for mapping by LANDSAT-8 TIRS's. The findings revealed that the NDBI and the NDVI had the strongest correlation with the
... Show MoreThe current study was carried out at the Fields belongs of Horticulture Department, Collage of Agricultural Engineering Science, University of Baghdad, Al-Jadiriyah for the spring season 2016 -2017 to study the effect for inoculation mycorrhizae and folair application with bio stimulators and their interaction in the growth characters of (local okra ptera). A factorial experiment (2 in randomized complete block design (RCBD), the experiment included (12) treatment Distributed in three replicates. The three factors used in this experiment included . The inoculation with control (C) Mycorrhizae ( M ) , Biozyme (B ) ( B1 2cm3.L-1), ( B2 4cm1-.L-1) , Phosphalas (P) (P 2cm3.L-1), ( M + B1), ( M + B2), (P +
... Show MoreContracting cancer typically induces a state of terror among the individuals who are affected. Exploring how chemotherapy and anxiety work together to affect the speed at which cancer cells multiply and the immune system’s response model is necessary to come up with ways to stop the spread of cancer. This paper proposes a mathematical model to investigate the impact of psychological scare and chemotherapy on the interaction of cancer and immunity. The proposed model is accurately described. The focus of the model’s dynamic analysis is to identify the potential equilibrium locations. According to the analysis, it is possible to establish three equilibrium positions. The stability analysis reveals that all equilibrium points consi
... Show MoreThe international reporting auditor witness rapidly developed over the past years, where profession began give attention to the development of auditor reporting and improve its informational report through the issuance and amendment of some relevant international auditing standards. The reality of the situation refers to the failure to inform the auditor in Iraq in many areas, including: Clearly defined management responsibility for the preparation of financial and auditor's responsibility to express an opinion on these statements and Amendment of opinion when the financial statements as a whole is free from material misstatement based on the evidence is sufficient and appropriate audit, or not to build the auditor's ability to obt
... Show MoreThe main topic of this study is central around the independence of Jordanian central bank and the extent of the effectiveness at the bank in leading the monetary policy without interferences or pressures from side of the government. the degree of independence of Jordanian central bank was based on the following based hypothesis following ,there is relationship between the independence of the central bank and the legislative and economical indices. the most important recommendations are degree of independence of the Jordan central bank 43.5% is a good one, but it possible to reach a higher degree than this one by to making some modification on the Jordanian central bank law and by the central bank should be more rigid
... Show More: Cervical malignancy positioned as the fourth most prevalent disease among women around the world. HPVs especially HPV16 are the causative agent of cervical cancer, responsible of about 5% of all human cancers worldwide. Some researchers found that the fibronectin is repressed by the papillomavirus (HPV) type 16 E7 oncoprotein in both HPV-positive nontumorigenic and tumorigenic cell lines, while others found that the HPV oncoprotein increase the levels of fibronectin. The aim is to study the effect of HPV infection on Fibronectin expression and their correlation onthe development of Cervicalcancinoma. The current retrospective study enrolled paraffinized blocks of two groups. The research included 30 cervical carcinomatous tissues as well
... Show MoreCoagulation - flocculation are basic chemical engineering method in the treatment of metal-bearing industrial wastewater because it removes colloidal particles, some soluble compounds and very fine solid suspensions initially present in the wastewater by destabilization and formation of flocs. This research was conducted to study the feasibility of using natural coagulant such as okra and mallow and chemical coagulant such as alum for removing Cu and increase the removal efficiency and reduce the turbidity of treated water. Fourier transform Infrared (FTIR) was carried out for okra and mallow before and after coagulant to determine their type of functional groups. Carbonyl and hydroxyl functional groups on the surface of
... Show MoreThe uptake of Cd(II) ions from simulated wastewater onto olive pips was modeled using artificial neural network (ANN) which consisted of three layers. Based on 112 batch experiments, the effect of contact time (10-240 min), initial pH (2-6), initial concentration (25-250 mg/l), biosorbent dosage (0.05-2 g/100 ml), agitation speed (0-250 rpm) and temperature (20-60ºC) were studied. The maximum uptake (=92 %) of Cd(II) was achieved at optimum parameters of 60 min, 6, 50 mg/l, 1 g/100 ml, 250 rpm and 25ºC respectively.
Tangent sigmoid and linear transfer functions of ANN for hidden and output layers respectively with 7 neurons were sufficient to present good predictions for cadmium removal efficiency with coefficient of correlatio
... Show MoreIn this paper we prove the boundedness of the solutions and their derivatives of the second order ordinary differential equation x ?+f(x) x ?+g(x)=u(t), under certain conditions on f,g and u. Our results are generalization of those given in [1].