LEARNING SPATIO-TEMPORAL FEATURE EXTRACTION USING RESIDUAL FRAMES WITH NEURALNETWORKS FOR HUMAN ACTION RECOGNITION
In recent times the growth of machine learning and artificial intelligence algorithms help to expand the use of image and video processing. The usage of different algorithms is applicable in various fields such as content-based video recognition, video surveillance, assistive living, autism care, and gaming. HAR (Human Action Recognition) highly demands efficient computation. This research proposed a method for selecting residual frames and keyframes toeliminate redundant information from videos. This method combines the extraction of spatial and temporal features. These features were extracted using the VGG16 (Visual Geometry Group) network and classified using Multi SVM classifier. The proposed research method was tested on HMDB51 and UCF101 datasets. The result of the proposed method achieved an accuracy of 85.6% and 98.71% onHMDB51andUCF101datasetsrespectively.
Spatial features, Temporal features, Keyframes, Residual frames, VGG16