A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H. Al-Timemy

doi:10.1186/s40537-023-00727-2

Details

Publication Date

Fri Apr 14 2023

Journal Name

Journal Of Big Data

Volume

10

DOI

10.1186/s40537-023-00727-2

Choose Citation Style

Statistics

View publication

26

View pdf

1

Statistics

(784)

(766)

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H. Al-Timemy

...Show More Authors

Abstract<p>Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.</p>

View Publication Preview PDF

Quick Preview PDF

Publication Date

Fri Mar 01 2019

Journal Name

Neurocomputing

A survey on video compression fast block matching algorithms

zaynab anwer

...Show More Authors

View Publication

(16)

(15)

Publication Date

Tue Jul 30 2024

Journal Name

Iraqi Journal Of Science

A Survey on Image Caption Generation in Various Languages

haneen siraj

...Show More Authors

The image caption is the process of adding an explicit, coherent description to the contents of the image. This is done by using the latest deep learning techniques, which include computer vision and natural language processing, to understand the contents of the image and give it an appropriate caption. Multiple datasets suitable for many applications have been proposed. The biggest challenge for researchers with natural language processing is that the datasets are incompatible with all languages. The researchers worked on translating the most famous English data sets with Google Translate to understand the content of the images in their mother tongue. In this paper, the proposed review aims to enhance the understanding o

View Publication

(5)

Publication Date

Tue May 07 2019

Journal Name

Acm Journal On Emerging Technologies In Computing Systems

Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis

Abdullah M.

Dhireesha

...Show More Authors

Hierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatio-temporal inputs. This article presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the proposed architecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused by unutil

View Publication

(15)

Publication Date

Wed Jun 16 2021

Journal Name

Cognitive Computation

Deep Transfer Learning for Improved Detection of Keratoconus using Corneal Topographic Maps

Ali H.

Nebras H.

Zahraa M.

Javier

...Show More Authors

Abstract <p>Clinical keratoconus (KCN) detection is a challenging and time-consuming task. In the diagnosis process, ophthalmologists must revise demographic and clinical ophthalmic examinations. The latter include slit-lamb, corneal topographic maps, and Pentacam indices (PI). We propose an Ensemble of Deep Transfer Learning (EDTL) based on corneal topographic maps. We consider four pretrained networks, SqueezeNet (SqN), AlexNet (AN), ShuffleNet (SfN), and MobileNet-v2 (MN), and fine-tune them on a dataset of KCN and normal cases, each including four topographic maps. We also consider a PI classifier. Then, our EDTL method combines the output probabilities of each of the five classifiers to obtain a decision b</p> ... Show More

View Publication

(47)

(36)

Publication Date

Mon Jan 01 2024

Journal Name

Fusion: Practice And Applications

Proposed Framework for Semantic Segmentation of Aerial Hyperspectral Images Using Deep Learning and SVM Approach

saadya

nuha

...Show More Authors

View Publication

(1)

Publication Date

Tue Mar 21 2023

Journal Name

International Journal Of Emerging Technologies In Learning (ijet)

Impact of Deep Learning Strategy in Mathematics Achievement and Practical Intelligence among High School Students

Ban Hassan

Areej Khuder

Sabah Saeed

...Show More Authors

— To identify the effect of deep learning strategy on mathematics achievement and practical intelligence among secondary school students during the 2022/2023 academic year. In the research, the experimental research method with two groups (experimental and control) with a post-test were adopted. The research community is represented by the female students of the fifth scientific grade from the first Karkh Education Directorate. (61) female students were intentionally chosen, and they were divided into two groups: an experimental group (30) students who were taught according to the proposed strategy, and a control group (31) students who were taught according to the usual method. For the purpose of collecting data for the experimen

View Publication

(19)

(12)

Publication Date

Tue Oct 06 2020

Journal Name

College Of Islamic Sciences

Difficulties in learning and teaching Arabic to non-native speakers in the Kurdistan Region Types and solutions

1. Difficulties. 2. learning and teaching Arabic. 3. non-native speakers

حازم

...Show More Authors

This research is a study of the difficulties of learning the Arabic language that faces Arabic language learners in the Kurdistan Region, by revealing its types and forms, which can be classified into two categories:

The first type has difficulties related to the educational system, the source of which is the Arabic language itself, the Arabic teacher or the learner studying the Arabic language or the educational curriculum, i.e. educational materials, or the educational process, i.e. the method used in teaching.

The second type: general difficulties related to the political aspect, the source of which is the policy of the Kurdistan Regional Government in marginalizing the Arabic language and replacing the forefront of th

View Publication Preview PDF

Publication Date

Tue Dec 20 2022

Journal Name

2022 International Conference On Computer And Applications (icca)

Improve Data Mining Techniques with a High-Performance Cluster

Fadhil H.M.

...Show More Authors

View Publication

Publication Date

Mon Oct 01 2012

Journal Name

Al–bahith Al–a'alami

Dealing of the Providers of Sport Media Content with Crises : (The Department of Media of the Ministry of Youth and Sports a Model)

Providers of Sport Media

Sport Media

Ministry of Youth and Sports

Hadi

...Show More Authors

In a report by Transparency Organization in 2010, Iraq has 200 newspapers, magazines, sixty-seven radio stations and 45 satellite TV channels. The increase in these figures is measured in days or weeks and not months and years. This fact confirms the importance of studying content providers, especially youth sports content, for two reasons: the first is that young people constitute the highest percentage in Iraqi society, with all the potential involved in shaping the future aspects; the second reason is that for years sport has become an important pillar in people's lives not only in the entertainment aspect as it was seen in the past; Rather, sport has an influential presence in politi

View Publication Preview PDF

Publication Date

Tue Dec 09 2025

Journal Name

Journal Of Al-farahidi’s Arts

Artificial Intelligence Applications in Machine Translation and Their Role in Bridging Semantic Gaps Across Languages: A Comparative Analytical Study of Chat GPT and Deep Seek

Artificial Intelligence

Machine Translation

ChatGPT

DeepSeek

Semantic Fidelity

Arabic-English Translation

BLEU

TER

Neural Language Models

Cross-Linguistic Communication

Asst. Lect. Inam Ghalib

...Show More Authors

With the fast-growing of neural machine translation (NMT), there is still a lack of insight into the performance of these models on semantically and culturally rich texts, especially between linguistically distant languages like Arabic and English. In this paper, we investigate the performance of two state-of-the-art AI translation systems (ChatGPT, DeepSeek) when translating Arabic texts to English in three different genres: journalistic, literary, and technical. The study utilizes a mixed-method evaluation methodology based on a balanced corpus of 60 Arabic source texts from the three genres. Objective measures, including BLEU and TER, and subjective evaluations from human translators were employed to determine the semantic, contextual an

Preview PDF

1 2 ... 13 14 15 16 ... 2314 2315