Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
The image caption is the process of adding an explicit, coherent description to the contents of the image. This is done by using the latest deep learning techniques, which include computer vision and natural language processing, to understand the contents of the image and give it an appropriate caption. Multiple datasets suitable for many applications have been proposed. The biggest challenge for researchers with natural language processing is that the datasets are incompatible with all languages. The researchers worked on translating the most famous English data sets with Google Translate to understand the content of the images in their mother tongue. In this paper, the proposed review aims to enhance the understanding o
... Show MoreHierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatio-temporal inputs. This article presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the proposed architecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused by unutil
... Show MoreClinical keratoconus (KCN) detection is a challenging and time-consuming task. In the diagnosis process, ophthalmologists must revise demographic and clinical ophthalmic examinations. The latter include slit-lamb, corneal topographic maps, and Pentacam indices (PI). We propose an Ensemble of Deep Transfer Learning (EDTL) based on corneal topographic maps. We consider four pretrained networks, SqueezeNet (SqN), AlexNet (AN), ShuffleNet (SfN), and MobileNet-v2 (MN), and fine-tune them on a dataset of KCN and normal cases, each including four topographic maps. We also consider a PI classifier. Then, our EDTL method combines the output probabilities of each of the five classifiers to obtain a decision b
— To identify the effect of deep learning strategy on mathematics achievement and practical intelligence among secondary school students during the 2022/2023 academic year. In the research, the experimental research method with two groups (experimental and control) with a post-test were adopted. The research community is represented by the female students of the fifth scientific grade from the first Karkh Education Directorate. (61) female students were intentionally chosen, and they were divided into two groups: an experimental group (30) students who were taught according to the proposed strategy, and a control group (31) students who were taught according to the usual method. For the purpose of collecting data for the experimen
... Show MoreThis research is a study of the difficulties of learning the Arabic language that faces Arabic language learners in the Kurdistan Region, by revealing its types and forms, which can be classified into two categories:
The first type has difficulties related to the educational system, the source of which is the Arabic language itself, the Arabic teacher or the learner studying the Arabic language or the educational curriculum, i.e. educational materials, or the educational process, i.e. the method used in teaching.
The second type: general difficulties related to the political aspect, the source of which is the policy of the Kurdistan Regional Government in marginalizing the Arabic language and replacing the forefront of th
... Show MoreIn a report by Transparency Organization in 2010, Iraq has 200 newspapers, magazines, sixty-seven radio stations and 45 satellite TV channels. The increase in these figures is measured in days or weeks and not months and years. This fact confirms the importance of studying content providers, especially youth sports content, for two reasons: the first is that young people constitute the highest percentage in Iraqi society, with all the potential involved in shaping the future aspects; the second reason is that for years sport has become an important pillar in people's lives not only in the entertainment aspect as it was seen in the past; Rather, sport has an influential presence in politi
... Show MoreWith the fast-growing of neural machine translation (NMT), there is still a lack of insight into the performance of these models on semantically and culturally rich texts, especially between linguistically distant languages like Arabic and English. In this paper, we investigate the performance of two state-of-the-art AI translation systems (ChatGPT, DeepSeek) when translating Arabic texts to English in three different genres: journalistic, literary, and technical. The study utilizes a mixed-method evaluation methodology based on a balanced corpus of 60 Arabic source texts from the three genres. Objective measures, including BLEU and TER, and subjective evaluations from human translators were employed to determine the semantic, contextual an
... Show More