3D Multi-Object Tracking in Point Clouds Based on Prediction Confidence-Guided Data Association

Abstract

This paper proposes a new 3D multi-object tracker to more robustly track objects that are temporarily missed by detectors. Our tracker can better leverage object features for 3D Multi-Object Tracking (MOT) in point clouds. The proposed tracker is based on a novel data association scheme guided by prediction confidence, and it consists of two key parts. First, we design a new predictor that employs a constant acceleration (CA) motion model to estimate future positions, and outputs a prediction confidence to guide data association through increased awareness of detection quality. Second, we introduce a new aggregated pairwise cost to exploit features of objects in point clouds for faster and more accurate data association. The proposed cost consists of geometry, appearance and motion components. Specifically, we formulate the geometry cost using resolutions (lengths, widths and heights), centroids, and orientations of 3D bounding boxes (BBs), the appearance cost using appearance features from the deep learning-based detector backbone network, and the motion cost by associating different motion vectors. Extensive multi-object tracking experiments on the KITTI tracking benchmark demonstrated that our method outperforms, by a large margin, the state-of-the-art methods in both tracking accuracy and speed.

Existing System

? One method is to use existing, state-of-the-art, 3D detectors (e.g., Point RCNN and Point Pillar) to predict 3D bounding boxes, which are then projected to corresponding images to obtain 2D bounding boxes. ? However, many approaches detect objects in 2D RGB sequences for tracking, which lacks reliability when localizing objects in 3D space. ? Multi-object tracking in a 3D environment (3D MOT) plays a crucial role in the environmental perception of autonomous systems. ? However, camera sensors are unlikely to provide depth information, unless relatively computationally expensive methods such as stereo-vision are used. ? One efficient method to solve this problem is to represent the objects in the adjacent frames as a directed graph.

Disadvantages

? In order to solve this problem, many researchers adopted projection methods to project 3D objects into multiple views and fuse the features of each view for detection and recognition. ? However, the point clouds acquired by LiDAR are little affected by the light and have the information of distance and volume, which can overcome these problems above and make up for the shortages of image processing. ? On the contrary, the aerial view of point cloud has a large field of view and no occlusion of the object, which is conducive to reidentification and solves the problems existing in the image. ? Our method is mainly used on the side of the road, there are a lot of occlusion and reappearance problems, which rarely occurs in the KITTI dataset.

Proposed System

• We conduct extensive experiments to evaluate the effectiveness of the proposed framework on the challenging KITTI benchmark and report state-of-the-art performance. • To evaluate the proposed approach and demonstrate the effectiveness of the key components, we conduct an ablation study on the KITTI benchmark under the online setting, with the state-of-the-art detector PointPillar. • The proposed method by modality fusion surpasses the previous best method MOTBeyondPixels by far fewer ID switches (184 fewer) with the same detection method. • The proposed attention guided fusion mechanism further improves accuracy.

Advantages

? In order to improve the performance of point clouds multiobject tracking and retrieve the ID information of occluded objects, we combine reidentification algorithm of pedestrian and 3D Kalman filter and apply them to point clouds. ? We use the advanced 3D detector on the KITTI dataset to conduct experiments and directly use their detection results for performance test of tracking. ? In order to reduce the calculation amount and improve the performance, we ignore the height H and Z coordinates. ? Multiobject tracking is widely used in autonomous driving systems because it can associate the results of object detection in time without switching the identities of multiple targets.

Download DOC Download PPT