Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
In present days, drug resistance is a major emerging problem in the healthcare sector. Novel antibiotics are in considerable need because present effective treatments have repeatedly failed. Antimicrobial peptides are the biologically active secondary metabolites produced by a variety of microorganisms like bacteria, fungi, and algae, which possess surface activity reduction activity along with this they are having antimicrobial, antifungal, and antioxidant antibiofilm activity. Antimicrobial peptides include a wide variety of bioactive compounds such as Bacteriocins, glycolipids, lipopeptides, polysaccharide-protein complexes, phospholipids, fatty acids, and neutral lipids. Bioactive peptides derived from various natural sources like bacte
... Show MoreThe Environmental Data Acquisition Telemetry System is a versatile, flexible and economical means to accumulate data from multiple sensors at remote locations over an extended period of time; the data is normally transferred to the final destination and saved for further analysis.
This paper introduces the design and implementation of a simplified, economical and practical telemetry system to collect and transfer the environmental parameters (humidity, temperature, pressure etc.) from a remote location (Rural Area) to the processing and displaying unit.
To get a flexible and practical system, three data transfer methods (three systems) were proposed (including the design and implementation) for rural area services, the fi
... Show MoreThe emergence of SARS-CoV-2, the virus responsible for the COVID-19 pandemic, has resulted in a global health crisis leading to widespread illness, death, and daily life disruptions. Having a vaccine for COVID-19 is crucial to controlling the spread of the virus which will help to end the pandemic and restore normalcy to society. Messenger RNA (mRNA) molecules vaccine has led the way as the swift vaccine candidate for COVID-19, but it faces key probable restrictions including spontaneous deterioration. To address mRNA degradation issues, Stanford University academics and the Eterna community sponsored a Kaggle competition.This study aims to build a deep learning (DL) model which will predict deterioration rates at each base of the mRNA
... Show MoreThe intelligent buildings provided various incentives to get highly inefficient energy-saving caused by the non-stationary building environments. In the presence of such dynamic excitation with higher levels of nonlinearity and coupling effect of temperature and humidity, the HVAC system transitions from underdamped to overdamped indoor conditions. This led to the promotion of highly inefficient energy use and fluctuating indoor thermal comfort. To address these concerns, this study develops a novel framework based on deep clustering of lagrangian trajectories for multi-task learning (DCLTML) and adding a pre-cooling coil in the air handling unit (AHU) to alleviate a coupling issue. The proposed DCLTML exhibits great overall control and is
... Show MoreIn this paper, we discuss physical layer security techniques in downlink networks, including eavesdroppers. The main objective of using physical layer security is delivering a perfectly secure message from a transmitter to an intended receiver in the presence of passive or active eavesdroppers who are trying to wiretap the information or disturb the network stability. In downlink networks, based on the random feature of channels to terminals, opportunistic user scheduling can be exploited as an additional tool for enhancing physical layer security. We introduce user scheduling strategies and discuss the corresponding performances according to different levels of channel state information (CSI) at the base station (BS). We show that the avai
... Show MoreThe study includes collection of data about cholera disease from six health centers from nine locations with 2500km2 and a population of 750000individual. The average of infection for six centers during the 2000-2003 was recorded. There were 3007 cases of diarrhea diagnosed as cholera caused by Vibrio cholerae. The percentage of male infection was 14. 7% while for female were 13. 2%. The percentage of infection for children (less than one year) was 6.1%, it while for the age (1-5 years) was 6.9%and for the ages more than 5 years was 14.5%.The total percentage of the patients stayed in hospital was 7.7%(4.2%for male and 3.4%for female). The bacteria was isolated and identified from 7cases in the Central Laboratory for Health in Baghdad. In
... Show MoreMost of the medical datasets suffer from missing data, due to the expense of some tests or human faults while recording these tests. This issue affects the performance of the machine learning models because the values of some features will be missing. Therefore, there is a need for a specific type of methods for imputing these missing data. In this research, the salp swarm algorithm (SSA) is used for generating and imputing the missing values in the pain in my ass (also known Pima) Indian diabetes disease (PIDD) dataset, the proposed algorithm is called (ISSA). The obtained results showed that the classification performance of three different classifiers which are support vector machine (SVM), K-nearest neighbour (KNN), and Naïve B
... Show More