Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes

CVPR2025, Poster
Ting Yu1, Yi Lin1, Jun Yu2, Zhenyu Lou3, Qiongjie Cui4
1Hangzhou Normal University   2Harbin Institute of Technology (Shenzhen)  
3Zhejiang University    4Singapore University of Technology and Design

🥰 Abstract

Recent advances in human motion prediction (HMP) have shifted focus from isolated motion data to integrating human-scene correlations. In particular, the latest methods leverage human gaze points, using their spatial coordinates to indicate intent—where a person might move within a 3D environment. Despite promising trajectory results, these methods often produce inaccurate poses by overlooking the semantic implications of gaze, specifically the affordances of observed objects, which indicate the possible interactions. To address this, we propose GAP3DS, an affordance-aware HMP model that utilizes gaze-informed object affordances to improve HMP in complex 3D environments. GAP3DS incorporates a gaze-guided affordance learner to identify relevant objects in the scene and infer their affordances based on human gaze, thus contextualizing future human-object interactions. This affordance information, enriched with visual features and gaze data, conditions the generation of multiple human-object interaction poses, which are subsequently decoded into final motion predictions. Extensive experiments on two real-world datasets demonstrate that GAP3DS outperforms state-of-the-art methods in both trajectory and pose accuracy, producing more physically consistent and contextually grounded predictions.

😘 Video

🏃 we propose GAP3DS, an affordance-aware human motion prediction (HMP) model that enhances realism and accuracy in real-world 3D environments.

🥳 GAP3DS's Results

BibTeX

@inproceedings{yu2025visionguided,
    author = {T. Yu and Y. Lin and J. Yu and Z. Lou and Q. Cui},
    title = {Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2025},
    note = {CCF A}
}