Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.
Speech is the essential way to interact between humans or between human and machine. However, it is always contaminated with different types of environment noise. Therefore, speech enhancement algorithms (SEA) have appeared as a significant approach in speech processing filed to suppress background noise and return back the original speech signal. In this paper, a new efficient two-stage SEA with low distortion is proposed based on minimum mean square error sense. The estimation of clean signal is performed by taking the advantages of Laplacian speech and noise modeling based on orthogonal transform (Discrete Krawtchouk-Tchebichef transform) coefficients distribution. The Discrete Kra
Political speeches are represented in different shapes as political forum, events or as inaugural speech. This research critically analyzes the inaugural Speech of the President Donald Trump which was delivered on 20th ,January, 2017 from the site<www.cnn.com> retrieved on 10th ,May,2017. The objectives of the study are: First: classifying and discussing well known micro structures (linguistic feature) of the speech, and second: classifying the macro structures i.e. the delivered political inaugural speech in which he includes social structures. To reach to the objectives of the study, the researcher will adopt Norman Fairclough’s three dimensional Analytical Model(1989). Tracing the model, the speech was subm
... Show MorePolitical speeches are represented in different shapes as political forum, events or as inaugural speech. This research critically analyzes the inaugural Speech of the President Donald Trump which was delivered on 20th ,January, 2017 from the site<www.cnn.com> retrieved on 10th ,May,2017. The objectives of the study are: First: classifying and discussing well known micro structures (linguistic feature) of the speech, and second: classifying the macro structures i.e. the delivered political inaugural speech in which he includes social structures. To reach to the objectives of the study, the researcher will adopt Norman Fairclough’s three dimensional Analytical Model(
... Show MoreThis study has been developed axes of the search, including: Search (deliberative) language and idiomatically, and Description Language (b social phenomenon), and the definition of the theory of (acts of speech), and discussed the problem of the conflict between tradition and innovation, as defined objectively have a target aimed at reviving the deliberative thought when Arab scholars , and the balance between the actual done Arab and Western rhetoric, but Meet in intellectual necessity, a sober reading that preserve the Arab language prestige, and its position in the light of the growing tongue Sciences, as long as we have inherited minds unique, and heritage huge able to consolidate the Arab theory lingual in linguistics.
Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to
... Show MoreWith the growth of mobile phones, short message service (SMS) became an essential text communication service. However, the low cost and ease use of SMS led to an increase in SMS Spam. In this paper, the characteristics of SMS spam has studied and a set of features has introduced to get rid of SMS spam. In addition, the problem of SMS spam detection was addressed as a clustering analysis that requires a metaheuristic algorithm to find the clustering structures. Three differential evolution variants viz DE/rand/1, jDE/rand/1, jDE/best/1, are adopted for solving the SMS spam problem. Experimental results illustrate that the jDE/best/1 produces best results over other variants in terms of accuracy, false-positive rate and false-negative
... Show MoreDeepfake is a type of artificial intelligence used to create convincing images, audio, and video hoaxes and it concerns celebrities and everyone because they are easy to manufacture. Deepfake are hard to recognize by people and current approaches, especially high-quality ones. As a defense against Deepfake techniques, various methods to detect Deepfake in images have been suggested. Most of them had limitations, like only working with one face in an image. The face has to be facing forward, with both eyes and the mouth open, depending on what part of the face they worked on. Other than that, a few focus on the impact of pre-processing steps on the detection accuracy of the models. This paper introduces a framework design focused on this asp
... Show MoreRespiratory tract infections in sheep are among the important health problems that affect all sheep ages around the world. Nine bacterial isolates obtained from sheep with respiratory tract infections were selected to be used in the current study. The isolates included 3 Staphylococcus aureus, 4 Klebsiella pneumoniae, and 2 Pseudomonas aeruginosa. Following the primers design by the Primer3Plus software tool and optimization of the conventional polymerase chain reaction (PCR), the primers were validated for their use in the multiplex PCR experiments. The MFEprimer program was used to check the suitability of the primer set combinations for multiplex PCR. The MFEprimer software was successful in designing the multiplex-PCR experiments and de
... Show MoreThe recent emergence of sophisticated Large Language Models (LLMs) such as GPT-4, Bard, and Bing has revolutionized the domain of scientific inquiry, particularly in the realm of large pre-trained vision-language models. This pivotal transformation is driving new frontiers in various fields, including image processing and digital media verification. In the heart of this evolution, our research focuses on the rapidly growing area of image authenticity verification, a field gaining immense relevance in the digital era. The study is specifically geared towards addressing the emerging challenge of distinguishing between authentic images and deep fakes – a task that has become critically important in a world increasingly reliant on digital med
... Show More