PoseNet

pose detection

image via: https://pdm.com.co/tag/posenet/

Description

PoseNet is a machine learning model that allows for Real-time Human Pose Estimation.

PoseNet can be used to estimate either a single pose or multiple poses, meaning there is a version of the algorithm that can detect only one person in an image/video and one version that can detect multiple persons in an image/video.

The original PoseNet model was ported to TensorFlow.js by Dan Oved. For background, read Real-time Human Pose Estimation in the Browser with TensorFlow.js.

Quickstart

  1. const video = document.getElementById('video');
  2. // Create a new poseNet method
  3. const poseNet = ml5.poseNet(video, modelLoaded);
  4. // When the model is loaded
  5. function modelLoaded() {
  6. console.log('Model Loaded!');
  7. }
  8. // Listen to new 'pose' events
  9. poseNet.on('pose', (results) => {
  10. poses = results;
  11. });

Usage

Initialize

There are a couple ways to initialize ml5.poseNet.

  1. // Initialize with video, type and callback
  2. const poseNet = ml5.poseNet(?video, ?type, ?callback);
  3. // OR Initialize with video, options and callback
  4. const poseNet = ml5.poseNet(?video, ?options, ?callback);
  5. // OR Initialize WITHOUT video. Just options and callback here
  6. const poseNet = ml5.poseNet(?callback, ?options);

Parameters

  • video: OPTIONAL. Optional HTMLVideoElement input to run poses on.

  • type: OPTIONAL. A String value to run single or multiple estimation. Changes the detectionType property of the options. Default is multiple.

  • callback: OPTIONAL. A function that is called when the model is loaded.

  • options: OPTIONAL. A object that contains properties that effect the posenet model accuracy, results, etc.

    1. {
    2. architecture: 'MobileNetV1',
    3. imageScaleFactor: 0.3,
    4. outputStride: 16,
    5. flipHorizontal: false,
    6. minConfidence: 0.5,
    7. maxPoseDetections: 5,
    8. scoreThreshold: 0.5,
    9. nmsRadius: 20,
    10. detectionType: 'multiple',
    11. inputResolution: 513,
    12. multiplier: 0.75,
    13. quantBytes: 2,
    14. };

Properties


.net

The poseNet model



.video

The optional video added to the



.architecture

The model architecture



.detectionType

The detection type



.imageScaleFactor

The image scale factor



.outputStride

Can be one of 8, 16, 32 (Stride 16, 32 are supported for the ResNet architecture and stride 8, 16, 32 are supported for the MobileNetV1 architecture). It specifies the output stride of the PoseNet model. The smaller the value, the larger the output resolution, and more accurate the model at the cost of speed. Set this to a larger value to increase speed at the cost of accuracy.



.flipHorizontal

Boolean. Flip the image horizontal or not.



.scoreThreshold

The threshold for returned values. Between 0 and 1. Only return instance detections that have root part score greater or equal to this value. Defaults to 0.5.



.maxPoseDetections

the maximum number of poses to detect. Defaults to 5.



.multiplier

Can be one of 1.01, 1.0, 0.75, or 0.50 (The value is used only by the MobileNetV1 architecture and not by the ResNet architecture). It is the float multiplier for the depth (number of channels) for all convolution ops. The larger the value, the larger the size of the layers, and more accurate the model at the cost of speed. Set this to a smaller value to increase speed at the cost of accuracy.



.inputResolution

Can be one of 161, 193, 257, 289, 321, 353, 385, 417, 449, 481, 513, and 801. Defaults to 257. It specifies the size the image is resized to before it is fed into the PoseNet model. The larger the value, the more accurate the model at the cost of speed. Set this to a smaller value to increase speed at the cost of accuracy.



.quantBytes

This argument controls the bytes used for weight quantization. The available options are: 4. 4 bytes per float (no quantization). Leads to highest accuracy and original model size (90MB). 2. 2 bytes per float. Leads to slightly lower accuracy and 2x model size reduction (45MB). 1. 1 byte per float. Leads to lower accuracy and 4x model size reduction (~22MB).



.nmsRadius

Non-maximum suppression part distance. It needs to be strictly positive. Two parts suppress each other if they are less than nmsRadius pixels away. Defaults to 20.


Methods


.on(‘pose’, …)

An event listener that returns the results when a pose is detected. You can use this with .singlePose() or .multiPose() or just listen for poses if you pass in a video into the constructor.

  1. poseNet.on('pose', callback);

📥 Inputs

  • callback: REQUIRED. A callback function to handle the results when a pose is detected. For example.
  1. poseNet.on('pose', (results) => {
  2. // do something with the results
  3. console.log(results);
  4. });

📤 Outputs

  • Array: Returns an array of objects. See documentation for .singlePose() and .multiPose()


.singlePose()

Given a number, will make magicSparkles

  1. poseNet.singlePose(?input);

📥 Inputs

  • input: Optional. A HTML video or image element or a p5 image or video element. If no input is provided, the default is to use the video given in the constructor.

📤 Outputs

  • Array: Returns an array of objects. A sample is included below.

    1. [
    2. {
    3. pose: {
    4. keypoints: [{ position: { x, y }, score, part }, ...],
    5. leftAngle: { x, y, confidence },
    6. leftEar: { x, y, confidence },
    7. leftElbow: { x, y, confidence },
    8. ...
    9. },
    10. },
    11. ];


.multiPose()

Given a number, will make magicSparkles

  1. poseNet.multiPose(?input);

📥 Inputs

  • input: Optional. Number. A HTML video or image element or a p5 image or video element. If no input is provided, the default is to use the video given in the constructor.

📤 Outputs

  • Array: Returns an array of objects. A sample is included below.

    1. [
    2. {
    3. pose: {
    4. keypoints: [{ position: { x, y }, score, part }, ...],
    5. leftAngle: { x, y, confidence },
    6. leftEar: { x, y, confidence },
    7. leftElbow: { x, y, confidence },
    8. ...
    9. },
    10. },
    11. {
    12. pose: {
    13. keypoints: [{ position: { x, y }, score, part }, ...],
    14. leftAngle: { x, y, confidence },
    15. leftEar: { x, y, confidence },
    16. leftElbow: { x, y, confidence },
    17. ...
    18. },
    19. },
    20. ];

Examples

p5.js

p5 web editor

plain javascript

Demo

No demos yet - contribute one today!

Tutorials

PoseNet on The Coding Train

Acknowledgements

Contributors:

  • Cristobal Valenzuela, Maya Man, Dan Oved

Credits:

  • Paper Reference | Website URL | Github Repo | Book reference | etc

Source Code