A Multimodal Data Processing System for LiDAR-Based Human Activity Recognition

Abstract

Increasingly, the task of detecting and recognizing the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors, like light detection and ranging (LiDAR), can complement the process to perceive the environment more robustly. Most recently, researchers have been exploring ways to apply convolutional neural networks to 3-D data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This article proposes a framework to tackle human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-D modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behavior, surveillance, and perhaps most notably by autonomous vehicles (AVs) to data-driven decision-making policies in urban areas and indoor environments.

Existing System

? Existing works are mainly devoted to smart homes for elderly or assisted living. ? However, few research works exist in AI-enabled safety detection of smart-factory floor workers. ? The vision-based recognition of human actions is an important research field in the integrative computer vision and multimedia analytics ecosystem. ? This review has thoroughly compared and summarized the landscape of vision-based RGB-D sensors. ? We provided an outline of existing commonly used datasets and highlighted key research that has mainly focused on RGB-D datasets.

Disadvantages

? The step cycles recorded in the 4D Studio were simply repeated continuously disregarding the step frequency and phase information, having a distracting visual impact. ? Another problem is that the intensity channel of the considered sensor is not calibrated, i.e. the measured intensity values are not necessarily characteristic for a given clothing material, and they may depend on the sensor’s distance and the view angle. ? Several methods tackle the detection problem on videos of monocular optical cameras. ? This re-identification issue has been partially addressed in, based on Lidar based weak biometric identifiers featuring the measured height and the intensity histogram of the people’s point cloud segments.

Proposed System

• We proposed a human-centered test-bed for an emergency detection system. • However, this approach differs from the proposed idea in this work, where we attempted to classify and detect non-image data. • In this work, we proposed a CNN model for the emergency detection in the smart factory shop floor, as well as developed a simple testbed for the purpose of providing dataset for future research direction in this area. • To the best of our knowledge, this is a major attempt to develop a testbed to capture data for the purpose of detection and classification of movement, breathing and vibration in a smart factory shop floor for human safety.

Advantages

? The sequences were recorded in different seasons, we can also investigate how different clothing styles (such as winter coats or t-shirts) influence the discriminating performance of the observed gait features. ? The presented results confirm that both efficient gait-based identification and activity recognition is achievable in the sparse point clouds of a single RMB Lidar sensor. ? A complex visual scene interpretation system implements several steps starting with people detection, followed by localization and tracking, trying to achieve higher level activity recognition or abnormal event detection functions, and efficient visualization. ? A new image-to-class distance metrics was proposed in to enable efficient comparison of different gait patterns.

Download DOC Download PPT