Spelling correction is considered a challenging task for resource-scarce languages. The Arabic language is one of these resource-scarce languages, which suffers from the absence of a large spelling correction dataset, thus datasets injected with artificial errors are used to overcome this problem. In this paper, we trained the Text-to-Text Transfer Transformer (T5) model using artificial errors to correct Arabic soft spelling mistakes. Our T5 model can correct 97.8% of the artificial errors that were injected into the test set. Additionally, our T5 model achieves a character error rate (CER) of 0.77% on a set that contains real soft spelling mistakes. We achieved these results using a 4-layer T5 model trained with a 90% error injection rate, with a maximum sequence length of 300 characters.
Translating culture-specific proverbs (CSPs) is a challenging task since they often occur in a peculiar context. Further, CSPs are intended to imply meanings that extend far beyond the literal meaning of such a kind of proverbs. As far as English and Arabic are concerned, translators often encounter problems in translating CSPs due to cultural differences between the source language(SL) and the target language (TL) as well as what seems to be the lack of equivalence for some CSPs.
In view of this, the present study aims at investigating the translation of CSPs in three English-Arabic dictionaries of proverbs, namely Dictionary of Common English Proverbs Translated and Explained (2004), One thousand and One English Pr
... Show MoreLoanwords are the words transferred from one language to another, which become essential part of the borrowing language. The loanwords have come from the source language to the recipient language because of many reasons. Detecting these loanwords is complicated task due to that there are no standard specifications for transferring words between languages and hence low accuracy. This work tries to enhance this accuracy of detecting loanwords between Turkish and Arabic language as a case study. In this paper, the proposed system contributes to find all possible loanwords using any set of characters either alphabetically or randomly arranged. Then, it processes the distortion in the pronunciation, and solves the problem of the missing lette
... Show MoreThis research is intended to high light the uses of political content in foreign Arabic / speaking websites, such as “ CNN “ and” Euro News“, The research problem stems from the main question: What is the nature of the use of the websites in the political content provided through them? A set of sub-questions that give the research aspects and aims to achieve a set of objectives , including the identification of topics that included , the political content provided through , the sample sites during the time period for analysis and determine that the study uses descriptive research based on the discovery of the researcher, describing it accurately and defining the relations between the components.
The research conducted the des
Sentiment analysis refers to the task of identifying polarity of positive and negative for particular text that yield an opinion. Arabic language has been expanded dramatically in the last decade especially with the emergence of social websites (e.g. Twitter, Facebook, etc.). Several studies addressed sentiment analysis for Arabic language using various techniques. The most efficient techniques according to the literature were the machine learning due to their capabilities to build a training model. Yet, there is still issues facing the Arabic sentiment analysis using machine learning techniques. Such issues are related to employing robust features that have the ability to discrimina
... Show MoreText categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy th
... Show MoreThe purpose of this paper to discriminate between the poetic poems of each poet depending on the characteristics and attribute of the Arabic letters. Four categories used for the Arabic letters, letters frequency have been included in a multidimensional contingency table and each dimension has two or more levels, then contingency coefficient calculated.
The paper sample consists of six poets from different historical ages, and each poet has five poems. The method was programmed using the MATLAB program, the efficiency of the proposed method is 53% for the whole sample, and between 90% and 95% for each poet's poems.
Objective(s): The aim of the study was to identify the prevalence of overweight and obesity in adolescence and
to estimate the effect of socio- demographic and health behaviors that predicting obesity in adolescents.
Methodology: A cross-sectional descriptive study was being carried out at three public Arabic secondary
schools in Erbil city from October 1
st 2010 to January 30th 2011. A systematic randomly sample size of 461 students
was selected.
Results: In this study, the age of (46.2%, 122) of males students were ranged between (17- 18.9) years old compared
to females students (74.1%, 146) their age ranged between (15 -16.9) years old. About (3.4%, 9) of males
adolescents having overweight while all female ado