✋ This model is not published yet.

You can claim this model if you're @vegetebird on GitHub. Contact us.

vegetebird / human-pose-estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

  • Public
  • 168 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 53 seconds. The predict time for this model varies significantly based on the inputs.

Readme

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation [CVPR 2022]

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation,
Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool,
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Usage

To use this model, input a short video clip of a person to track their pose in both 2D and 3D.

You may also download the extracted keypoints in compressed .npz format. They can be loaded using np.load('keypoints.npz')['reconstruction'], which yields an array of shape
(1, <num_frames>, 17, 2), representing the x, y coordinates of all 17 joints for each frame.

Currently, the model only tracks one person at a time; future support may be added for multi-person tracking.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{li2022mhformer,
  title={MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation},
  author={Li, Wenhao and Liu, Hong and Tang, Hao and Wang, Pichao and Van Gool, Luc},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={13147-13156},
  year={2022}
}

Acknowledgement

Our code is extended from the following repositories. We thank the authors for releasing the codes.

Licence

This project is licensed under the terms of the MIT license.