Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review

Osamah Mohammed Alyasiri; Yu-N Cheah; Ammar Kamal Abasi; Omar Mustafa Al-Janabi

doi:10.1109/ACCESS.2022.3165814

Details

Publication Date

Sat Jan 01 2022

Journal Name

Ieee Access

Volume

10

DOI

10.1109/ACCESS.2022.3165814

Choose Citation Style

Statistics

View publication

41

View original publication

2

Click abstract more

2

View pdf

5

Statistics

(51)

(43)

Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review

Metaheuristics

Feature extraction

Text categorization

Classification algorithms

Systematics

Search problems

Business

Osamah Mohammed Alyasiri

Yu-N Cheah

Ammar Kamal Abasi

Omar Mustafa Al-Janabi

...Show More Authors

Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematically studied by exploring available studies of different metaheuristic algorithms used for FS to improve TC. This paper will contribute to the body of existing knowledge by answering four research questions (RQs): 1) What are the different approaches of FS that apply metaheuristic algorithms to improve TC? 2) Does applying metaheuristic algorithms for TC lead to better accuracy than the typical FS methods? 3) How effective are the modified, hybridized metaheuristic algorithms for text FS problems?, and 4) What are the gaps in the current studies and their future directions? These RQs led to a study of recent works on metaheuristic-based FS methods, their contributions, and limitations. Hence, a final list of thirty-seven (37) related articles was extracted and investigated to align with our RQs to generate new knowledge in the domain of study. Most of the conducted papers focused on addressing the TC in tandem with metaheuristic algorithms based on the wrapper and hybrid FS approaches. Future research should focus on using a hybrid-based FS approach as it intuitively handles complex optimization problems and potentiality provide new research opportunities in this rapidly developing field.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Mon Jun 05 2023

Journal Name

Journal Of Economics And Administrative Sciences

Statistical Methods for Controlling the Quality of Crude Oil Products in Iraq

Crude

Oil

Products

Hypotheses

Principles

Likert scale

Cronbach

Ethar

...Show More Authors

The purpose of this study is to measure the levels of quality control for some crude oil products in Iraqi refineries, and how they are close to the international standards, through the application of statistical methods in quality control of oil products in Iraqi refineries. Where the answers of the study sample were applied to a group of Iraqi refinery employees (Al-Dora refinery, Al-Nasiriyah refinery, and Al-Basra refinery) on the principles of quality management control, and according to the different personal characteristics (gender, age, academic qualification, number of years of experience, job level). In order to achieve the objectives of the study, a questionnaire that included (12) items, in order to collect preliminary inform

View Publication Preview PDF

Publication Date

Tue Jun 15 2021

Journal Name

Al-academy

Diversity of design methods for global competitive advertising: عمار صباح شاكر ناجي

Design Techniques - Competitive Commercial Advertising

Ammar

...Show More Authors

As a result of the development and global openness and the possibility of companies providing their services outside their spatial boundaries that were determined by them, and the transformation of the world due to the development of the means of communication into a large global market that accommodates all products from different regions and of the same type and production field, competition resulted between companies, and the race to obtain the largest market share It ensures the largest amount of profits, and it is natural for the advertising promotion by companies for their product to shift from an advertisement for one product to a competitive advertisement that calls on the recipient to leave the competing product and switch to it

View Publication Preview PDF

Publication Date

Sat May 01 2021

Journal Name

Journal Of Physics: Conference Series

Three Weighted Residuals Methods for Solving the Nonlinear Thin Film Flow Problem

Areej Salah

Majeed A

...Show More Authors

Abstract<p>In this paper, the methods of weighted residuals: Collocation Method (CM), Least Squares Method (LSM) and Galerkin Method (GM) are used to solve the thin film flow (TFF) equation. The weighted residual methods were implemented to get an approximate solution to the TFF equation. The accuracy of the obtained results is checked by calculating the maximum error remainder functions (MER). Moreover, the outcomes were examined in comparison with the 4<sup>th</sup>-order Runge-Kutta method (RK4) and good agreements have been achieved. All the evaluations have been successfully implemented by using the computer system Mathematica®10.</p>

View Publication

(1)

Publication Date

Wed Apr 15 2020

Journal Name

Al-mustansiriyah Journal Of Science

Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Imbalanced Datasets

O.S.

SMOTE

Borderline-SMOTE

ADASYN.

Liqaa M.

Jamila H.

...Show More Authors

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Border

View Publication Preview PDF

Publication Date

Fri Feb 01 2019

Journal Name

Journal Of Economics And Administrative Sciences

Comparison of Some Methods for Estimating the Scheff'e Model of the Mixture

/ ٳنموذج الخليط

المكونات الزائفة

عامل تضخم التباين

ٳنحدار الحرف

LASSO

الشبكة المرنة

متوسط مطلق الخطـأ النسبي.

Mixture Models

L-Pseudo component

Variance Inflation Factor

Ridge Regression

LASSO

Elastic Net

Mean Absolut Percentage Error

دجلة ابراهيم

حلا سلمان

...Show More Authors

Because of the experience of the mixture problem of high correlation and the existence of linear MultiCollinearity between the explanatory variables, because of the constraint of the unit and the interactions between them in the model, which increases the existence of links between the explanatory variables and this is illustrated by the variance inflation vector (VIF), L-Pseudo component to reduce the bond between the components of the mixture.

To estimate the parameters of the mixture model, we used in our research the use of methods that increase bias and reduce variance, such as the Ridge Regression Method and the Least Absolute Shrinkage and Selection Operator (LASSO) method a

View Publication Preview PDF

Publication Date

Wed Jan 01 2020

Journal Name

Journal Of King Saud University - Science

Three iterative methods for solving second order nonlinear ODEs arising in physics

M.A.

M.I.

G.H.

...Show More Authors

View Publication

(18)

Publication Date

Mon Jun 08 2020

Journal Name

Research Journal Of Chemistry And Environment

High Performance Liquid Chromatographic and Areaunder Curve spectrophotometric Methods forEstimation of Cefixime in Pure and MarketedFormulation: A Comparative Study

Cefixime

HPLC

AUC

spectrophotometric

Marketed Formulation

Bahaa

...Show More Authors

Cefixime is an antibiotic useful for treating a variety ofmicroorganism infections. In the present work, tworapid, specific, inexpensive and nontoxic methods wereproposed for cefixime determination. Area under curvespectrophotometric and HPLC methods were depictedfor the micro quantification of Cefixime in highly pureand local market formulation. The area under curve(first technique) used in calculation of the cefiximepeak using a UV-visible spectrophotometer.The HPLC (2nd technique) was depended on thepurification of Cefixime by a C18 separating column250mm (length of column) × 4.6 mm (diameter)andusing methanol 50% (organic modifier) and deionizedwater 50% as a mobile phase. The isocratic flow withrate of 1 mL/min was applied, the temper

Publication Date

Fri Apr 01 2016

Journal Name

Journal Of Engineering

Satellite Images Classification in Rural Areas Based on Fractal Dimension

nquadtree

box counting

fractal dimension

supervised classification.

Mohammed Sahib

Aqeel Abboud Abdul

...Show More Authors

Fractal geometry is receiving increase attention as a quantitative and qualitative model for natural phenomena description, which can establish an active classification technique when applied on satellite images. In this paper, a satellite image is used which was taken by Quick Bird that contains different visible classes. After pre-processing, this image passes through two stages: segmentation and classification. The segmentation carried out by hybrid two methods used to produce effective results; the two methods are Quadtree method that operated inside Horizontal-Vertical method. The hybrid method is segmented the image into two rectangular blocks, either horizontally or vertically depending on spectral uniformity crit

View Publication Preview PDF

Publication Date

Thu Jun 28 2018

Journal Name

2018 4th International Conference On Computer And Technology Applications (iccta)

Improving accuracy of CADx system by hybrid PCA and backpropagation

principal component analysis

feed forward back propagation neural network

breast cancer

computer-aided diagnosis

medical imaging

Hind

Eman

Samera

Mohammed

...Show More Authors

—Medical images have recently played a significant role in the diagnosis and detection of various diseases. Medical imaging can provide a means of direct visualization to observe through the human body and notice the small anatomical change and biological processes associated by different biological and physical parameters. To achieve a more accurate and reliable diagnosis, nowadays, varieties of computer aided detection (CAD) and computer-aided diagnosis (CADx) approaches have been established to help interpretation of the medical images. The CAD has become among the many major research subjects in diagnostic radiology and medical imaging. In this work we study the improvement in accuracy of detection of CAD system when comb

View Publication Preview PDF

(1)

Publication Date

Tue Dec 01 2015

Journal Name

Journal Of Economics And Administrative Sciences

A Comparison Between Some Estimator Methods of Linear Regression Model With Auto-Correlated Errors With Application Data for the Wheat in Iraq

الانحدار الخطي- الارتباط الذاتي- طريقة ثايل- طريقة المعدل غير الموزون- طريقة لابلاس.

Linear Regression – Autocorrelation - Un Weighted Average Method - Theil Method - Laplace Method.

احمد ذياب

...Show More Authors

This research a study model of linear regression problem of autocorrelation of random error is spread when a normal distribution as used in linear regression analysis for relationship between variables and through this relationship can predict the value of a variable with the values of other variables, and was comparing methods (method of least squares, method of the average un-weighted, Thiel method and Laplace method) using the mean square error (MSE) boxes and simulation and the study included fore sizes of samples (15, 30, 60, 100). The results showed that the least-squares method is best, applying the fore methods of buckwheat production data and the cultivated area of the provinces of Iraq for years (2010), (2011), (2012),

View Publication Preview PDF

1 2 ... 109 110 111 112 ... 2179 2180