Fusion of Convolutional Self-Attention and Cross-Dimensional Feature Transformation for Human Posture Estimation
-
Graphical Abstract
-
Abstract
Human posture estimation is a prominent research topic in the fields of human-computer interaction, motion recognition, and other intelligent applications. However, achieving high accuracy in key point localization, which is crucial for intelligent applications, contradicts the low detection accuracy of human posture detection models in practical scenarios. To address this issue, a human pose estimation network called AT-HRNet has been proposed, which combines convolutional self-attention and cross-dimensional feature transformation. AT-HRNet captures significant feature information from various regions in an adaptive manner, aggregating them through convolutional operations within the local receptive domain. The residual structures TripNeck and TripBlock of the high-resolution network are designed to further refine the key point locations, where the attention weight is adjusted by a cross-dimensional interaction to obtain more features. To validate the effectiveness of this network, AT-HRNet was evaluated using the COCO2017 dataset. The results show that AT-HRNet outperforms HRNet by improving 3.2% in mAP, 4.0% in AP75, and 3.9% in APM. This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.
-
-