The Seybold Report

Original Research

Online Sequential Extreme Learning Machine (OSELM) based Q-learning(OSELM-QL)

Mustafa Hasan Kathim, Nurul Azma Zakaria, Z.Zainal Abidin, Ziadoon Kamil Maseer, Ali Hasan Alzamili

Vol 16, No 11 ( 2021 ) | DOI: 10.5281/zenodo.6553518 | Author Affiliation: Centre for Advanced Computing Technology, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka (UTeM) | Licensing: CC 4.0 | Pg no: 1-14 | Published on: 09-11-2021

Abstract

The usage of reinforcement learning (RL) for many type of applications is increasing. The quick development of machine learning models in the recent years has motivated researchers to integrate Q-learning with deep learning which has opened the door for many vision based applications of RL. However, using RL with shallow types of neural network has not been tackled adequately in the literature despite its need for real time types of applications such as control systems or time constraint decision based system. In this article, we propose a novel online sequential extreme learning machine OSELM based RL using Q-learning named as OSELM-QL. In OSELM-QL, the role of the neural network NN is to learn the best criterion for selecting the action based on Q value for exploitation or select the action randomly for exploration. For validation, we compare the predicted Q values by NN to show its feasibility to operate as an assistant sub-block to classical Q-learning for balancing between exploration and exploitation. The convergence is proved in the simulation model. This emphasizes on the potential of using this approach as an assistant block in the standard Q-learning system for exploration and exploitation balancing.

Keywords

Reinforcement learning; Q-learning; online sequential extreme learning machine; neural network.

Download PDF →

| Home

Overview

Original Research

Mustafa Hasan Kathim, Nurul Azma Zakaria, Z.Zainal Abidin, Ziadoon Kamil Maseer, Ali Hasan Alzamili

Vol 16, No 11 ( 2021 ) | DOI: 10.5281/zenodo.6553518 | Author Affiliation: Centre for Advanced Computing Technology, Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka (UTeM) | Licensing: CC 4.0 | Pg no: 1-14 | Published on: 09-11-2021

Abstract

Keywords