Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization

Nuha M. Khassaf; Nada Hussein M. Ali

doi:10.48084/etasr.8455

Details

Publication Date

Wed Oct 09 2024

Journal Name

Engineering, Technology & Applied Science Research

Volume

14

Issue Number

5

DOI

10.48084/etasr.8455

Choose Citation Style

Statistics

View publication

7

Statistics

(1)

(3)

Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization

CNN pre-trained models

LSTM

activation function

hyper-parameters

overfitting

Nuha M. Khassaf

Nada Hussein M. Ali

...Show More Authors

The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of the previous stage. Improvements include the use of a new activation function, regular parameter tuning, and an improved learning rate in the later stages of training. The experimental results on the flickr8k dataset showed a noticeable and satisfactory improvement in the second stage, where a clear increment was achieved in the evaluation metrics Bleu1-4, Meteor, and Rouge-L. This increment confirmed the effectiveness of the alterations and highlighted the importance of hyper-parameter tuning in improving the performance of CNN-LSTM models in image caption tasks.

View Publication

Publication Date

Tue Dec 05 2023

Journal Name

Baghdad Science Journal

Indoor/Outdoor Deep Learning Based Image Classification for Object Recognition Applications

Deep learning

GoogleNet

Image classification

Indoor/outdoor

Transfer learning.

Omar Abdullatif

Mohammed Jawad

Zenah Hadi

...Show More Authors

With the rapid development of smart devices, people's lives have become easier, especially for visually disabled or special-needs people. The new achievements in the fields of machine learning and deep learning let people identify and recognise the surrounding environment. In this study, the efficiency and high performance of deep learning architecture are used to build an image classification system in both indoor and outdoor environments. The proposed methodology starts with collecting two datasets (indoor and outdoor) from different separate datasets. In the second step, the collected dataset is split into training, validation, and test sets. The pre-trained GoogleNet and MobileNet-V2 models are trained using the indoor and outdoor se

View Publication Preview PDF

(5)

Publication Date

Sat Oct 30 2021

Journal Name

Iraqi Journal Of Science

Small Binary Codebook Design for Image Compression Depending on Rotating Blocks

Rafah Rasheed

Saif B.

Rafah Rasheed

...Show More Authors

The searching process using a binary codebook of combined Block Truncation Coding (BTC) method and Vector Quantization (VQ), i.e. a full codebook search for each input image vector to find the best matched code word in the codebook, requires a long time. Therefore, in this paper, after designing a small binary codebook, we adopted a new method by rotating each binary code word in this codebook into 900 to 2700 step 900 directions. Then, we systematized each code word depending on its angle to involve four types of binary code books (i.e. Pour when , Flat when , Vertical when, or Zigzag). The proposed scheme was used for decreasing the time of the coding procedure, with very small distortion per block, by designing s

(4)

(1)

Publication Date

Wed Jul 17 2019

Journal Name

Aip Conference Proceedings

The interpolation effect on the spare sinogram for 3D image reconstruction

3D Reconstruction

Fourier Slice Theorem

Interpolation

Radon Transform

Tomography

Hawraa H. Al-Waelly

Hameed M

...Show More Authors

The effect of using three different interpolation methods (nearest neighbour, linear and non-linear) on a 3D sinogram to restore the missing data due to using angular difference greater than 1° (considered as optimum 3D sinogram) is presented. Two reconstruction methods are adopted in this study, the back-projection method and Fourier slice theorem method, from the results the second reconstruction proven to be a promising reconstruction with the linear interpolation method when the angular difference is less than 20°.

View Publication Preview PDF

(1)

Publication Date

Wed Dec 13 2023

Journal Name

2023 3rd International Conference On Intelligent Cybernetics Technology & Applications (icicyta)

GPT-4 versus Bard and Bing: LLMs for Fake Image Detection

Deepfakes

Visualization

Image processing

Media

Security

Reliability

Task analysis

Omar Mustafa

Osamah Mohammed

Elaf Ayyed

...Show More Authors

The recent emergence of sophisticated Large Language Models (LLMs) such as GPT-4, Bard, and Bing has revolutionized the domain of scientific inquiry, particularly in the realm of large pre-trained vision-language models. This pivotal transformation is driving new frontiers in various fields, including image processing and digital media verification. In the heart of this evolution, our research focuses on the rapidly growing area of image authenticity verification, a field gaining immense relevance in the digital era. The study is specifically geared towards addressing the emerging challenge of distinguishing between authentic images and deep fakes – a task that has become critically important in a world increasingly reliant on digital med

View Publication Preview PDF

(7)

(3)

Publication Date

Thu Feb 15 2024

Journal Name

Journal Of Theoretical And Applied Information Technology

CHOOSING THE RIGHT CHAOTIC MAP FOR IMAGE ENCRYPTION: A DETAILED EXAMINATION

Chaos-based

Chaotic Map

Cipher

Image Encryption

Image Security

Saad

...Show More Authors

This article investigates how an appropriate chaotic map (Logistic, Tent, Henon, Sine...) should be selected taking into consideration its advantages and disadvantages in regard to a picture encipherment. Does the selection of an appropriate map depend on the image properties? The proposed system shows relevant properties of the image influence in the evaluation process of the selected chaotic map. The first chapter discusses the main principles of chaos theory, its applicability to image encryption including various sorts of chaotic maps and their math. Also this research explores the factors that determine security and efficiency of such a map. Hence the approach presents practical standpoint to the extent that certain chaos maps will bec

View Publication

Publication Date

Mon Dec 05 2022

Journal Name

Baghdad Science Journal

MSRD-Unet: Multiscale Residual Dilated U-Net for Medical Image Segmentation

Attention

Deep Learning

Dilated Convolution

Medical Image Segmentation

U-Net

Muna

Ban N.

...Show More Authors

Semantic segmentation is an exciting research topic in medical image analysis because it aims to detect objects in medical images. In recent years, approaches based on deep learning have shown a more reliable performance than traditional approaches in medical image segmentation. The U-Net network is one of the most successful end-to-end convolutional neural networks (CNNs) presented for medical image segmentation. This paper proposes a multiscale Residual Dilated convolution neural network (MSRD-UNet) based on U-Net. MSRD-UNet replaced the traditional convolution block with a novel deeper block that fuses multi-layer features using dilated and residual convolution. In addition, the squeeze and execution attention mechanism (SE) and the s

View Publication Preview PDF

(7)

(5)

Publication Date

Sun Dec 01 2024

Journal Name

Journal Of Ecological Engineering

Enhancing the Removal of Methyl Orange Dye by Electrocoagulation System with Nickel Foam Electrode – Optimization with Surface Response Methodology

electrocoagulation

Ni foam

Fe foam

aluminum

surface response.

Amor T.

Rasha H.

...Show More Authors

Azo dyes like methyl orange (MO) are very toxic components due to their recalcitrant properties which makes their removal from wastewater of textile industries a significant issue. The present study aimed to study their removal by utilizing aluminum and Ni foam (NiF) as anodes besides Fe foam electrodes as cathodes in an electrocoagulation (EC) system. Primary experiments were conducted using two Al anodes, two NiF anodes, or Al-NiF anodes to predict their advantages and drawbacks. It was concluded that the Al-NiF anodes were very effective in removing MO dye without long time of treatment or Ni leaching at in the case of adopting the Al-Al or NiF-NiF anodes, respectively. The structure and surface morphology of the NiF electrode were inves

View Publication Preview PDF

Publication Date

Sun Dec 01 2024

Journal Name

Journal Of The Iraqi Translators Association Translation And Linguistics

COMPARISON OF THE IDEOLOGICAL DISCOURSE OF THE RUSSIAN-UKRAINIAN WAR IN REPORTS FROM NEWS CHANNELS CNN AND RT

critical discourse analysis

Russia-Ukraine crisis

discourse strategy

framing

ideology

Ali

...Show More Authors

This study conducts a systematic comparative critical discourse analysis of news reports from prominent American (CNN) and Russian (RT) media sources covering the Russia-Ukraine conflict. Utilizing the theoretical frameworks of Norman Fairclough's multidimensional model and Teun van Dijk's socio-cognitive approach, the research examines the underlying ideological assumptions and discursive strategies employed by the two contrasting news channels. Quantitative analysis of discursive techniques and linguistic features provides insights into how each channel selectively utilizes language to convey distinct ideological positions. The findings demonstrate how media discourse constructs and normalizes particular ideological representations of pol

Publication Date

Sun Dec 30 2018

Journal Name

Journal Of Pure And Applied Microbiology

Mixture Design of Experiments for the Optimization of Carbon Source for Promoting Undecylprodigiosin and Actinorhodin Production

Luti K.

...Show More Authors

View Publication

(6)

(5)

Publication Date

Fri Jan 01 2016

Journal Name

Modern Applied Science

Hybrid Methodology for Image Segmentation Based on Active Contour Module and Alpha-Shape Theory

active contours models

Alpha Shape

automatic initialization

image segmentation

snake

Mohammed

...Show More Authors

The concept of the active contour model has been extensively utilized in the segmentation and analysis of images. This technology has been effectively employed in identifying the contours in object recognition, computer graphics and vision, biomedical processing of images that is normal images or medical images such as Magnetic Resonance Images (MRI), X-rays, plus Ultrasound imaging. Three colleagues, Kass, Witkin and Terzopoulos developed this energy, lessening “Active Contour Models” (equally identified as Snake) back in 1987. Being curved in nature, snakes are characterized in an image field and are capable of being set in motion by external and internal forces within image data and the curve itself in that order. The present s

1 2 ... 30 31 32 33 ... 999 1000