
Since 2017, I have been a lecturer in the Department of Statistics, bringing a wealth of expertise as a highly skilled statistician with a PhD in Statistics. My extensive academic and research background is highlighted by my specialization in advanced statistical techniques for genetic and genomic data, along with a robust portfolio of publications in prestigious journals. I am proficient in Python and R programming and have a rich experience in teaching and mentoring students. I aim to leverage my expertise in statistical analysis and data science to propel innovative research in the field of Statistics.
PhD in Statistics - University of Kent- UK - 2015.
MSc in Mathematics- College of science-University of Baghdad- Iraq- 2001.
Bsc in Mathematics- College of Education -University of Thiqar- 1998
Responsible of the Performance Assurance Unit (PAU) at the College of Administration and Economics.
Duties:
Annually, I manage a comprehensive review process where students evaluate the teaching effectiveness of approximately 300 instructors using detailed surveys. The collected feedback is meticulously analyzed to generate a teaching quality score for each instructor, ranging from 0 to 100.
Scholarship from Ministry of Higher Education and Scientific Research to do PHD in mathematical Statistics in the UK(2010)
My research interests lie at the intersection of advanced statistical methods and genetic/genomic data analysis. With a PhD in Statistics and extensive experience in both academia and research, I am particularly focused on developing and applying sophisticated statistical techniques to unravel complex biological data. My work has consistently been published in reputable journals, demonstrating a proven track record in the field. I am highly proficient in programming languages such as Python and R, which I utilize to create robust statistical models and perform comprehensive data analyses. Additionally, I have a strong background in teaching and supervising students, which has further honed my ability to communicate complex statistical concepts effectively. I am passionate about leveraging my expertise in statistical analysis and data science to contribute to groundbreaking research and innovation in Statistics, particularly in areas that require the integration of large-scale genetic and genomic datasets. Through my work, I aim to advance our understanding of biological processes and contribute to the development of new methodologies that can drive future discoveries in the field.
Mathematical Statistics, Statistical Genetics, Bioinformatics, Artificial Intelligence
Taught the following modules:
o Matrices for Second-year students (1st semester)
o Linear Algebra for Second-year students (2nd semester)
o Numerical Analysis 1 for Third-Year students (1st semester)
o Numerical Analysis 2 for Third-Year students (2nd semester)
Supervision of Master’s Students:
o Yusera Suhieel (2023- 2024).
Project title: On inference of non-parametric regression for interval data.
o Huda Sami (2021-2023).
Project title: Some of inferential methods of constrained regression model by cone project with application.
o Noor Mohammed(2020-2022).
Project title: Estimation of parameters of mixture of Rayleigh distribution with application.
o Urdak Ibrahim (2019-2021).
Project tle: Comparison of Some Methods for Estimating Mixture of Linear Regression Models with Application
o Sara Ayyed (2019 - 2020)
Project Title: The Use of Logistic Regression Model in Estimating the Probability of Being Affected by Breast Cancer Based on the Levels of Interleukins and Cancer Marker CA15-3
The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype coclassification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled
... Show MoreThe haplotype association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease.Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls.It starts with inferring haplotypes from genotypes followed by a haplotype co-classification and marginal screening for disease-associated haplotypes.Unfortunately,phasing uncertainty may have a strong effects on the haplotype co-classification and therefore on the accuracy of predicting risk haplotypes.Here,to address the issue,we propose an alternative approach:In Stage 1,we select potential risk genotypes inste
... Show MoreBreast cancer has got much attention in the recent years as it is a one of the complex diseases that can threaten people lives. It can be determined from the levels of secreted proteins in the blood. In this project, we developed a method of finding a threshold to classify the probability of being affected by it in a population based on the levels of the related proteins in relatively small case-control samples. We applied our method to simulated and real data. The results showed that the method we used was accurate in estimating the probability of being diseased in both simulation and real data. Moreover, we were able to calculate the sensitivity and specificity under the null hypothesis of our research question of being diseased o
... Show MoreIn the lifetime process in some systems, most data cannot belong to one single population. In fact, it can represent several subpopulations. In such a case, the known distribution cannot be used to model data. Instead, a mixture of distribution is used to modulate the data and classify them into several subgroups. The mixture of Rayleigh distribution is best to be used with the lifetime process. This paper aims to infer model parameters by the expectation-maximization (EM) algorithm through the maximum likelihood function. The technique is applied to simulated data by following several scenarios. The accuracy of estimation has been examined by the average mean square error (AMSE) and the average classification success rate (ACSR). T
... Show MoreMethods of estimating statistical distribution have attracted many researchers when it comes to fitting a specific distribution to data. However, when the data belong to more than one component, a popular distribution cannot be fitted to such data. To tackle this issue, mixture models are fitted by choosing the correct number of components that represent the data. This can be obvious in lifetime processes that are involved in a wide range of engineering applications as well as biological systems. In this paper, we introduce an application of estimating a finite mixture of Inverse Rayleigh distribution by the use of the Bayesian framework when considering the model as Markov chain Monte Carlo (MCMC). We employed the Gibbs sampler and
... Show MoreMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated d
A mixture model is used to model data that come from more than one component. In recent years, it became an effective tool in drawing inferences about the complex data that we might come across in real life. Moreover, it can represent a tremendous confirmatory tool in classification observations based on similarities amongst them. In this paper, several mixture regression-based methods were conducted under the assumption that the data come from a finite number of components. A comparison of these methods has been made according to their results in estimating component parameters. Also, observation membership has been inferred and assessed for these methods. The results showed that the flexible mixture model outperformed the others
... Show MoreThe estimation of the parameters of linear regression is based on the usual Least Square method, as this method is based on the estimation of several basic assumptions. Therefore, the accuracy of estimating the parameters of the model depends on the validity of these hypotheses. The most successful technique was the robust estimation method which is minimizing maximum likelihood estimator (MM-estimator) that proved its efficiency in this purpose. However, the use of the model becomes unrealistic and one of these assumptions is the uniformity of the variance and the normal distribution of the error. These assumptions are not achievable in the case of studying a specific problem that may include complex data of more than one model. To
... Show MoreInferential methods of statistical distributions have reached a high level of interest in recent years. However, in real life, data can follow more than one distribution, and then mixture models must be fitted to such data. One of which is a finite mixture of Rayleigh distribution that is widely used in modelling lifetime data in many fields, such as medicine, agriculture and engineering. In this paper, we proposed a new Bayesian frameworks by assuming conjugate priors for the square of the component parameters. We used this prior distribution in the classical Bayesian, Metropolis-hasting (MH) and Gibbs sampler methods. The performance of these techniques were assessed by conducting data which was generated from two and three-component mixt
... Show MoreThe Dagum Regression Model, introduced to address limitations in traditional econometric models, provides enhanced flexibility for analyzing data characterized by heavy tails and asymmetry, which is common in income and wealth distributions. This paper develops and applies the Dagum model, demonstrating its advantages over other distributions such as the Log-Normal and Gamma distributions. The model's parameters are estimated using Maximum Likelihood Estimation (MLE) and the Method of Moments (MoM). A simulation study evaluates both methods' performance across various sample sizes, showing that MoM tends to offer more robust and precise estimates, particularly in small samples. These findings provide valuable insights into the ana
... Show More