Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.
This study explores the semiotic aspects of American slang, specifically focusing on the phenomenon of reduplicative expressions in informal speech. Despite the extensive research on American slang, limited attention has been given to the cultural and mythical meanings embedded within reduplicative expressions. To address this gap, the study investigates how these expressions convey denotative, connotative, and mythical meanings within casual American discourse. The objectives of the study include: 1. To what extent does Barthes’ semiotic model hold potential for application in this study? 2. How are reduplicative slang expressions widely used in everyday American life? 3. To what extent do qualitative and quantitative methods hav
... Show MoreA robust and sensitive analytical method is presented for the extraction and determination of six pharmaceuticals in freshwater sediments.
Background: The posterior slope of the articular eminence of completely edentulous patients compared to patients with maintained occlusion shows significant flattening. This study aimed to correlate between the flattening of the posterior slope of the articular eminence, with dental status, age, genders, on both sides using computed tomography. Materials and Methods: The sample of the present study was a total of 117 Iraqi subjects, who admitted to the maxillofacial department at Al-Sadr Teaching Hospital in Al-Najaf city. The examination was performed on CT scanner; the eminence inclination was measured in two methods using sagittal section. Results: Clinically, the inclination of articular eminence was higher in edentulous subjects than i
... Show MoreMagnetic Resonance Imaging (MRI) uses magnetization and radio waves, rather than x-rays to make very detailed, cross- sectional pictures of the brain. In this work we are going to explain some procedures belongs contrast and brightness improvement which is very important in the improvement the image quality such as the manipulation with the image histogram. Its has been explained in this worked the histogram shrink i.e. reducing the size of the gray level gives a dim low contrast picture is produced, where, the histogram stretching of the gray level was distributed on a wide scale but there is no increase in the number of pixels in the bright region. The histogram equalization has also been discuss together with its effects of the improveme
... Show MoreSurveillance cameras are video cameras used for the purpose of observing an area. They are often connected to a recording device or IP network, and may be watched by a security guard or law enforcement officer. In case of location have less percentage of movement (like home courtyard during night); then we need to check whole recorded video to show where and when that motion occur which are wasting in time. So this paper aims at processing the real time video captured by a Webcam to detect motion in the Scene using MATLAB 2012a, with keeping in mind that camera still recorded which means real time detection. The results show accuracy and efficiency in detecting motion
Investigating gender differences based on emotional changes becomes essential to understand various human behaviors in our daily life. Ten students from the University of Vienna have been recruited by recording the electroencephalogram (EEG) dataset while watching four short emotional video clips (anger, happiness, sadness, and neutral) of audiovisual stimuli. In this study, conventional filter and wavelet (WT) denoising techniques were applied as a preprocessing stage and Hurst exponent
The weak and strong forms are so called because it is not their lexical content that primary matter, but the role they have in the sentence. The problematic confusion, our students encounter, in recognizing and producing the correct pronunciation of weak and strong forms of the English function words is the main incentive behind conducting this study. In order to gather the data, this paper used two types of tests: a recognition test and a production test. The general results reached through the analysis of the students' answers seem to conform to the researcher's assumption: students face a critical problem in recognizing and producing correct pronunciation of the weak and strong forms of the English funct
... Show More