Three-dimensional (3D) object detection is crucial for accurate recognition of autonomous driving roads, and the distribution of point clouds in 3D scenes becomes sparse with increasing distance, thus seriously affecting the sensor’s perception precision. To address this problem, we propose a two-stage 3D object detection network based on point and voxel feature fusion. In the first stage, a spatial semantic feature fusion module is designed to effectively fuse low-level spatial features and high-level semantic features to generate high-quality proposals. Then, an attention mechanism-based residual module is constructed to expand the receptive field and adaptively aggregate the voxel features in the 3D scene. At the same time, the sampled key points and voxel features are fused to extract the key information in the 3D scene. In the second stage, the graph network pooling module is introduced to construct local graphs on 3D proposals using key point features as nodes to estimate the confidence and location of objects more accurately. Experimental results on the KITTI dataset show that the detection precision is improved significantly in easy, moderate, and hard tasks. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Object detection
Voxels
Point clouds
Semantics
Feature fusion
Convolution
Feature extraction