Online Sequential Extreme Learning Machine (OSELM) based Q-learning(OSELM-QL)
The usage of reinforcement learning (RL) for many type of applications is increasing. The quick development of machine learning models in the recent years has motivated researchers to integrate Q-learning with deep learning which has opened the door for many vision based applications of RL. However, using RL with shallow types of neural network has not been tackled adequately in the literature despite its need for real time types of applications such as control systems or time constraint decision based system. In this article, we propose a novel online sequential extreme learning machine OSELM based RL using Q-learning named as OSELM-QL. In OSELM-QL, the role of the neural network NN is to learn the best criterion for selecting the action based on Q value for exploitation or select the action randomly for exploration. For validation, we compare the predicted Q values by NN to show its feasibility to operate as an assistant sub-block to classical Q-learning for balancing between exploration and exploitation. The convergence is proved in the simulation model. This emphasizes on the potential of using this approach as an assistant block in the standard Q-learning system for exploration and exploitation balancing.
Reinforcement learning; Q-learning; online sequential extreme learning machine; neural network.