Project import generated by Copybara.
GitOrigin-RevId: 5b4c149782c086ebf9ef390195fb260ad0103217
This commit is contained in:
parent
350fbb2100
commit
a92cff7a60
|
@ -2,6 +2,8 @@
|
||||||
layout: default
|
layout: default
|
||||||
title: Pose
|
title: Pose
|
||||||
parent: Solutions
|
parent: Solutions
|
||||||
|
has_children: true
|
||||||
|
has_toc: false
|
||||||
nav_order: 5
|
nav_order: 5
|
||||||
---
|
---
|
||||||
|
|
||||||
|
@ -21,10 +23,9 @@ nav_order: 5
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Human pose estimation from video plays a critical role in various applications
|
Human pose estimation from video plays a critical role in various applications
|
||||||
such as
|
such as [quantifying physical exercises](./pose_classification.md), sign
|
||||||
[quantifying physical exercises](#pose-classification-and-repetition-counting),
|
language recognition, and full-body gesture control. For example, it can form
|
||||||
sign language recognition, and full-body gesture control. For example, it can
|
the basis for yoga, dance, and fitness applications. It can also enable the
|
||||||
form the basis for yoga, dance, and fitness applications. It can also enable the
|
|
||||||
overlay of digital content and information on top of the physical world in
|
overlay of digital content and information on top of the physical world in
|
||||||
augmented reality.
|
augmented reality.
|
||||||
|
|
||||||
|
@ -387,121 +388,6 @@ on how to build MediaPipe examples.
|
||||||
* Target:
|
* Target:
|
||||||
[`mediapipe/examples/desktop/upper_body_pose_tracking:upper_body_pose_tracking_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/upper_body_pose_tracking/BUILD)
|
[`mediapipe/examples/desktop/upper_body_pose_tracking:upper_body_pose_tracking_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/upper_body_pose_tracking/BUILD)
|
||||||
|
|
||||||
## Pose Classification and Repetition Counting
|
|
||||||
|
|
||||||
One of the applications
|
|
||||||
[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
|
|
||||||
can enable is fitness. More specifically - pose classification and repetition
|
|
||||||
counting. In this section we'll provide basic guidance on building a custom pose
|
|
||||||
classifier with the help of a
|
|
||||||
[Colab](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
|
|
||||||
and wrap it in a simple
|
|
||||||
[fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
|
|
||||||
powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
|
|
||||||
are used for demonstration purposes as the most common exercises.
|
|
||||||
|
|
||||||
![pose_classification_pushups_and_squats.gif](../images/mobile/pose_classification_pushups_and_squats.gif) |
|
|
||||||
:--------------------------------------------------------------------------------------------------------: |
|
|
||||||
*Fig 4. Pose classification and repetition counting with MediaPipe Pose.* |
|
|
||||||
|
|
||||||
We picked the
|
|
||||||
[k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
|
|
||||||
(k-NN) as the classifier. It's simple and easy to start with. The algorithm
|
|
||||||
determines the object's class based on the closest samples in the training set.
|
|
||||||
To build it, one needs to:
|
|
||||||
|
|
||||||
* Collect image samples of the target exercises and run pose prediction on
|
|
||||||
them,
|
|
||||||
* Convert obtained pose landmarks to a representation suitable for the k-NN
|
|
||||||
classifier and form a training set,
|
|
||||||
* Perform the classification itself followed by repetition counting.
|
|
||||||
|
|
||||||
### Training Set
|
|
||||||
|
|
||||||
To build a good classifier appropriate samples should be collected for the
|
|
||||||
training set: about a few hundred samples for each terminal state of each
|
|
||||||
exercise (e.g., "up" and "down" positions for push-ups). It's important that
|
|
||||||
collected samples cover different camera angles, environment conditions, body
|
|
||||||
shapes, and exercise variations.
|
|
||||||
|
|
||||||
![pose_classification_pushups_un_and_down_samples.jpg](../images/mobile/pose_classification_pushups_un_and_down_samples.jpg) |
|
|
||||||
:--------------------------------------------------------------------------------------------------------------------------: |
|
|
||||||
*Fig 5. Two terminal states of push-ups.* |
|
|
||||||
|
|
||||||
To transform samples into a k-NN classifier training set, either
|
|
||||||
[basic](https://drive.google.com/file/d/1z4IM8kG6ipHN6keadjD-F6vMiIIgViKK/view?usp=sharing)
|
|
||||||
or
|
|
||||||
[extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
|
|
||||||
Colab could be used. They both use the
|
|
||||||
[Python Solution API](#python-solution-api) to run the BlazePose models on given
|
|
||||||
images and dump predicted pose landmarks to a CSV file. Additionally, the
|
|
||||||
extended Colab provides useful tools to find outliers (e.g., wrongly predicted
|
|
||||||
poses) and underrepresented classes (e.g., not covering all camera angles) by
|
|
||||||
classifying each sample against the entire training set. After that, you'll be
|
|
||||||
able to test the classifier on an arbitrary video right in the Colab.
|
|
||||||
|
|
||||||
### Classification
|
|
||||||
|
|
||||||
Code of the classifier is available both in the
|
|
||||||
[extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
|
|
||||||
Colab and in the
|
|
||||||
[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
|
|
||||||
Please refer to them for details of the approach described below.
|
|
||||||
|
|
||||||
The k-NN algorithm used for pose classification requires a feature vector
|
|
||||||
representation of each sample and a metric to compute the distance between two
|
|
||||||
such vectors to find the nearest pose samples to a target one.
|
|
||||||
|
|
||||||
To convert pose landmarks to a feature vector, we use pairwise distances between
|
|
||||||
predefined lists of pose joints, such as distances between wrist and shoulder,
|
|
||||||
ankle and hip, and two wrists. Since the algorithm relies on distances, all
|
|
||||||
poses are normalized to have the same torso size and vertical torso orientation
|
|
||||||
before the conversion.
|
|
||||||
|
|
||||||
![pose_classification_pairwise_distances.png](../images/mobile/pose_classification_pairwise_distances.png) |
|
|
||||||
:--------------------------------------------------------------------------------------------------------: |
|
|
||||||
*Fig 6. Main pairwise distances used for the pose feature vector.* |
|
|
||||||
|
|
||||||
To get a better classification result, k-NN search is invoked twice with
|
|
||||||
different distance metrics:
|
|
||||||
|
|
||||||
* First, to filter out samples that are almost the same as the target one but
|
|
||||||
have only a few different values in the feature vector (which means
|
|
||||||
differently bent joints and thus other pose class), minimum per-coordinate
|
|
||||||
distance is used as distance metric,
|
|
||||||
* Then average per-coordinate distance is used to find the nearest pose
|
|
||||||
cluster among those from the first search.
|
|
||||||
|
|
||||||
Finally, we apply
|
|
||||||
[exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
|
|
||||||
(EMA) smoothing to level any noise from pose prediction or classification. To do
|
|
||||||
that, we search not only for the nearest pose cluster, but we calculate a
|
|
||||||
probability for each of them and use it for smoothing over time.
|
|
||||||
|
|
||||||
### Repetition Counter
|
|
||||||
|
|
||||||
To count the repetitions, the algorithm monitors the probability of a target
|
|
||||||
pose class. Let's take push-ups with its "up" and "down" terminal states:
|
|
||||||
|
|
||||||
* When the probability of the "down" pose class passes a certain threshold for
|
|
||||||
the first time, the algorithm marks that the "down" pose class is entered.
|
|
||||||
* Once the probability drops below the threshold, the algorithm marks that the
|
|
||||||
"down" pose class has been exited and increases the counter.
|
|
||||||
|
|
||||||
To avoid cases when the probability fluctuates around the threshold (e.g., when
|
|
||||||
the user pauses between "up" and "down" states) causing phantom counts, the
|
|
||||||
threshold used to detect when the state is exited is actually slightly lower
|
|
||||||
than the one used to detect when the state is entered. It creates an interval
|
|
||||||
where the pose class and the counter can't be changed.
|
|
||||||
|
|
||||||
### Future Work
|
|
||||||
|
|
||||||
We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
|
|
||||||
allow us to use joint angles in the feature vectors, which are more natural and
|
|
||||||
easier to configure (although distances can still be useful to detect touches
|
|
||||||
between body parts) and to perform rotation normalization of poses and reduce
|
|
||||||
the number of camera angles required for accurate k-NN classification.
|
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
* Google AI Blog:
|
* Google AI Blog:
|
||||||
|
@ -512,5 +398,3 @@ the number of camera angles required for accurate k-NN classification.
|
||||||
* [Models and model cards](./models.md#pose)
|
* [Models and model cards](./models.md#pose)
|
||||||
* [Web demo](https://code.mediapipe.dev/codepen/pose)
|
* [Web demo](https://code.mediapipe.dev/codepen/pose)
|
||||||
* [Python Colab](https://mediapipe.page.link/pose_py_colab)
|
* [Python Colab](https://mediapipe.page.link/pose_py_colab)
|
||||||
* [Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic)
|
|
||||||
* [Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended)
|
|
||||||
|
|
142
docs/solutions/pose_classification.md
Normal file
142
docs/solutions/pose_classification.md
Normal file
|
@ -0,0 +1,142 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: Pose Classification
|
||||||
|
parent: Pose
|
||||||
|
grand_parent: Solutions
|
||||||
|
nav_order: 1
|
||||||
|
---
|
||||||
|
|
||||||
|
# Pose Classification
|
||||||
|
{: .no_toc }
|
||||||
|
|
||||||
|
<details close markdown="block">
|
||||||
|
<summary>
|
||||||
|
Table of contents
|
||||||
|
</summary>
|
||||||
|
{: .text-delta }
|
||||||
|
1. TOC
|
||||||
|
{:toc}
|
||||||
|
</details>
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
One of the applications
|
||||||
|
[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
|
||||||
|
can enable is fitness. More specifically - pose classification and repetition
|
||||||
|
counting. In this section we'll provide basic guidance on building a custom pose
|
||||||
|
classifier with the help of [Colabs](#colabs) and wrap it in a simple
|
||||||
|
[fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
|
||||||
|
powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
|
||||||
|
are used for demonstration purposes as the most common exercises.
|
||||||
|
|
||||||
|
![pose_classification_pushups_and_squats.gif](../images/mobile/pose_classification_pushups_and_squats.gif) |
|
||||||
|
:--------------------------------------------------------------------------------------------------------: |
|
||||||
|
*Fig 1. Pose classification and repetition counting with MediaPipe Pose.* |
|
||||||
|
|
||||||
|
We picked the
|
||||||
|
[k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
|
||||||
|
(k-NN) as the classifier. It's simple and easy to start with. The algorithm
|
||||||
|
determines the object's class based on the closest samples in the training set.
|
||||||
|
|
||||||
|
**To build it, one needs to:**
|
||||||
|
|
||||||
|
1. Collect image samples of the target exercises and run pose prediction on
|
||||||
|
them,
|
||||||
|
2. Convert obtained pose landmarks to a representation suitable for the k-NN
|
||||||
|
classifier and form a training set using these [Colabs](#colabs),
|
||||||
|
3. Perform the classification itself followed by repetition counting (e.g., in
|
||||||
|
the
|
||||||
|
[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)).
|
||||||
|
|
||||||
|
## Training Set
|
||||||
|
|
||||||
|
To build a good classifier appropriate samples should be collected for the
|
||||||
|
training set: about a few hundred samples for each terminal state of each
|
||||||
|
exercise (e.g., "up" and "down" positions for push-ups). It's important that
|
||||||
|
collected samples cover different camera angles, environment conditions, body
|
||||||
|
shapes, and exercise variations.
|
||||||
|
|
||||||
|
![pose_classification_pushups_un_and_down_samples.jpg](../images/mobile/pose_classification_pushups_un_and_down_samples.jpg) |
|
||||||
|
:--------------------------------------------------------------------------------------------------------------------------: |
|
||||||
|
*Fig 2. Two terminal states of push-ups.* |
|
||||||
|
|
||||||
|
To transform samples into a k-NN classifier training set, both
|
||||||
|
[`Pose Classification Colab (Basic)`] and
|
||||||
|
[`Pose Classification Colab (Extended)`] could be used. They use the
|
||||||
|
[Python Solution API](./pose.md#python-solution-api) to run the BlazePose models
|
||||||
|
on given images and dump predicted pose landmarks to a CSV file. Additionally,
|
||||||
|
the [`Pose Classification Colab (Extended)`] provides useful tools to find
|
||||||
|
outliers (e.g., wrongly predicted poses) and underrepresented classes (e.g., not
|
||||||
|
covering all camera angles) by classifying each sample against the entire
|
||||||
|
training set. After that, you'll be able to test the classifier on an arbitrary
|
||||||
|
video right in the Colab.
|
||||||
|
|
||||||
|
## Classification
|
||||||
|
|
||||||
|
Code of the classifier is available both in the
|
||||||
|
[`Pose Classification Colab (Extended)`] and in the
|
||||||
|
[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
|
||||||
|
Please refer to them for details of the approach described below.
|
||||||
|
|
||||||
|
The k-NN algorithm used for pose classification requires a feature vector
|
||||||
|
representation of each sample and a metric to compute the distance between two
|
||||||
|
such vectors to find the nearest pose samples to a target one.
|
||||||
|
|
||||||
|
To convert pose landmarks to a feature vector, we use pairwise distances between
|
||||||
|
predefined lists of pose joints, such as distances between wrist and shoulder,
|
||||||
|
ankle and hip, and two wrists. Since the algorithm relies on distances, all
|
||||||
|
poses are normalized to have the same torso size and vertical torso orientation
|
||||||
|
before the conversion.
|
||||||
|
|
||||||
|
![pose_classification_pairwise_distances.png](../images/mobile/pose_classification_pairwise_distances.png) |
|
||||||
|
:--------------------------------------------------------------------------------------------------------: |
|
||||||
|
*Fig 3. Main pairwise distances used for the pose feature vector.* |
|
||||||
|
|
||||||
|
To get a better classification result, k-NN search is invoked twice with
|
||||||
|
different distance metrics:
|
||||||
|
|
||||||
|
* First, to filter out samples that are almost the same as the target one but
|
||||||
|
have only a few different values in the feature vector (which means
|
||||||
|
differently bent joints and thus other pose class), minimum per-coordinate
|
||||||
|
distance is used as distance metric,
|
||||||
|
* Then average per-coordinate distance is used to find the nearest pose
|
||||||
|
cluster among those from the first search.
|
||||||
|
|
||||||
|
Finally, we apply
|
||||||
|
[exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
|
||||||
|
(EMA) smoothing to level any noise from pose prediction or classification. To do
|
||||||
|
that, we search not only for the nearest pose cluster, but we calculate a
|
||||||
|
probability for each of them and use it for smoothing over time.
|
||||||
|
|
||||||
|
## Repetition Counting
|
||||||
|
|
||||||
|
To count the repetitions, the algorithm monitors the probability of a target
|
||||||
|
pose class. Let's take push-ups with its "up" and "down" terminal states:
|
||||||
|
|
||||||
|
* When the probability of the "down" pose class passes a certain threshold for
|
||||||
|
the first time, the algorithm marks that the "down" pose class is entered.
|
||||||
|
* Once the probability drops below the threshold, the algorithm marks that the
|
||||||
|
"down" pose class has been exited and increases the counter.
|
||||||
|
|
||||||
|
To avoid cases when the probability fluctuates around the threshold (e.g., when
|
||||||
|
the user pauses between "up" and "down" states) causing phantom counts, the
|
||||||
|
threshold used to detect when the state is exited is actually slightly lower
|
||||||
|
than the one used to detect when the state is entered. It creates an interval
|
||||||
|
where the pose class and the counter can't be changed.
|
||||||
|
|
||||||
|
## Future Work
|
||||||
|
|
||||||
|
We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
|
||||||
|
allow us to use joint angles in the feature vectors, which are more natural and
|
||||||
|
easier to configure (although distances can still be useful to detect touches
|
||||||
|
between body parts) and to perform rotation normalization of poses and reduce
|
||||||
|
the number of camera angles required for accurate k-NN classification.
|
||||||
|
|
||||||
|
## Colabs
|
||||||
|
|
||||||
|
* [`Pose Classification Colab (Basic)`]
|
||||||
|
* [`Pose Classification Colab (Extended)`]
|
||||||
|
|
||||||
|
[`Pose Classification Colab (Basic)`]: https://mediapipe.page.link/pose_classification_basic
|
||||||
|
[`Pose Classification Colab (Extended)`]: https://mediapipe.page.link/pose_classification_extended
|
Loading…
Reference in New Issue
Block a user