Project import generated by Copybara.
GitOrigin-RevId: 5b4c149782c086ebf9ef390195fb260ad0103217
This commit is contained in:
		
							parent
							
								
									350fbb2100
								
							
						
					
					
						commit
						a92cff7a60
					
				| 
						 | 
					@ -2,6 +2,8 @@
 | 
				
			||||||
layout: default
 | 
					layout: default
 | 
				
			||||||
title: Pose
 | 
					title: Pose
 | 
				
			||||||
parent: Solutions
 | 
					parent: Solutions
 | 
				
			||||||
 | 
					has_children: true
 | 
				
			||||||
 | 
					has_toc: false
 | 
				
			||||||
nav_order: 5
 | 
					nav_order: 5
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -21,10 +23,9 @@ nav_order: 5
 | 
				
			||||||
## Overview
 | 
					## Overview
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Human pose estimation from video plays a critical role in various applications
 | 
					Human pose estimation from video plays a critical role in various applications
 | 
				
			||||||
such as
 | 
					such as [quantifying physical exercises](./pose_classification.md), sign
 | 
				
			||||||
[quantifying physical exercises](#pose-classification-and-repetition-counting),
 | 
					language recognition, and full-body gesture control. For example, it can form
 | 
				
			||||||
sign language recognition, and full-body gesture control. For example, it can
 | 
					the basis for yoga, dance, and fitness applications. It can also enable the
 | 
				
			||||||
form the basis for yoga, dance, and fitness applications. It can also enable the
 | 
					 | 
				
			||||||
overlay of digital content and information on top of the physical world in
 | 
					overlay of digital content and information on top of the physical world in
 | 
				
			||||||
augmented reality.
 | 
					augmented reality.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -387,121 +388,6 @@ on how to build MediaPipe examples.
 | 
				
			||||||
    *   Target:
 | 
					    *   Target:
 | 
				
			||||||
        [`mediapipe/examples/desktop/upper_body_pose_tracking:upper_body_pose_tracking_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/upper_body_pose_tracking/BUILD)
 | 
					        [`mediapipe/examples/desktop/upper_body_pose_tracking:upper_body_pose_tracking_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/upper_body_pose_tracking/BUILD)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Pose Classification and Repetition Counting
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
One of the applications
 | 
					 | 
				
			||||||
[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
 | 
					 | 
				
			||||||
can enable is fitness. More specifically - pose classification and repetition
 | 
					 | 
				
			||||||
counting. In this section we'll provide basic guidance on building a custom pose
 | 
					 | 
				
			||||||
classifier with the help of a
 | 
					 | 
				
			||||||
[Colab](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
 | 
					 | 
				
			||||||
and wrap it in a simple
 | 
					 | 
				
			||||||
[fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
 | 
					 | 
				
			||||||
powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
 | 
					 | 
				
			||||||
are used for demonstration purposes as the most common exercises.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 |
 | 
					 | 
				
			||||||
:--------------------------------------------------------------------------------------------------------: |
 | 
					 | 
				
			||||||
*Fig 4. Pose classification and repetition counting with MediaPipe Pose.*                                  |
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
We picked the
 | 
					 | 
				
			||||||
[k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
 | 
					 | 
				
			||||||
(k-NN) as the classifier. It's simple and easy to start with. The algorithm
 | 
					 | 
				
			||||||
determines the object's class based on the closest samples in the training set.
 | 
					 | 
				
			||||||
To build it, one needs to:
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
*   Collect image samples of the target exercises and run pose prediction on
 | 
					 | 
				
			||||||
    them,
 | 
					 | 
				
			||||||
*   Convert obtained pose landmarks to a representation suitable for the k-NN
 | 
					 | 
				
			||||||
    classifier and form a training set,
 | 
					 | 
				
			||||||
*   Perform the classification itself followed by repetition counting.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### Training Set
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To build a good classifier appropriate samples should be collected for the
 | 
					 | 
				
			||||||
training set: about a few hundred samples for each terminal state of each
 | 
					 | 
				
			||||||
exercise (e.g., "up" and "down" positions for push-ups). It's important that
 | 
					 | 
				
			||||||
collected samples cover different camera angles, environment conditions, body
 | 
					 | 
				
			||||||
shapes, and exercise variations.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 |
 | 
					 | 
				
			||||||
:--------------------------------------------------------------------------------------------------------------------------: |
 | 
					 | 
				
			||||||
*Fig 5. Two terminal states of push-ups.*                                                                                    |
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To transform samples into a k-NN classifier training set, either
 | 
					 | 
				
			||||||
[basic](https://drive.google.com/file/d/1z4IM8kG6ipHN6keadjD-F6vMiIIgViKK/view?usp=sharing)
 | 
					 | 
				
			||||||
or
 | 
					 | 
				
			||||||
[extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
 | 
					 | 
				
			||||||
Colab could be used. They both use the
 | 
					 | 
				
			||||||
[Python Solution API](#python-solution-api) to run the BlazePose models on given
 | 
					 | 
				
			||||||
images and dump predicted pose landmarks to a CSV file. Additionally, the
 | 
					 | 
				
			||||||
extended Colab provides useful tools to find outliers (e.g., wrongly predicted
 | 
					 | 
				
			||||||
poses) and underrepresented classes (e.g., not covering all camera angles) by
 | 
					 | 
				
			||||||
classifying each sample against the entire training set. After that, you'll be
 | 
					 | 
				
			||||||
able to test the classifier on an arbitrary video right in the Colab.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### Classification
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Code of the classifier is available both in the
 | 
					 | 
				
			||||||
[extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
 | 
					 | 
				
			||||||
Colab and in the
 | 
					 | 
				
			||||||
[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
 | 
					 | 
				
			||||||
Please refer to them for details of the approach described below.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The k-NN algorithm used for pose classification requires a feature vector
 | 
					 | 
				
			||||||
representation of each sample and a metric to compute the distance between two
 | 
					 | 
				
			||||||
such vectors to find the nearest pose samples to a target one.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To convert pose landmarks to a feature vector, we use pairwise distances between
 | 
					 | 
				
			||||||
predefined lists of pose joints, such as distances between wrist and shoulder,
 | 
					 | 
				
			||||||
ankle and hip, and two wrists. Since the algorithm relies on distances, all
 | 
					 | 
				
			||||||
poses are normalized to have the same torso size and vertical torso orientation
 | 
					 | 
				
			||||||
before the conversion.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 |
 | 
					 | 
				
			||||||
:--------------------------------------------------------------------------------------------------------: |
 | 
					 | 
				
			||||||
*Fig 6. Main pairwise distances used for the pose feature vector.*                                         |
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To get a better classification result, k-NN search is invoked twice with
 | 
					 | 
				
			||||||
different distance metrics:
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
*   First, to filter out samples that are almost the same as the target one but
 | 
					 | 
				
			||||||
    have only a few different values in the feature vector (which means
 | 
					 | 
				
			||||||
    differently bent joints and thus other pose class), minimum per-coordinate
 | 
					 | 
				
			||||||
    distance is used as distance metric,
 | 
					 | 
				
			||||||
*   Then average per-coordinate distance is used to find the nearest pose
 | 
					 | 
				
			||||||
    cluster among those from the first search.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Finally, we apply
 | 
					 | 
				
			||||||
[exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
 | 
					 | 
				
			||||||
(EMA) smoothing to level any noise from pose prediction or classification. To do
 | 
					 | 
				
			||||||
that, we search not only for the nearest pose cluster, but we calculate a
 | 
					 | 
				
			||||||
probability for each of them and use it for smoothing over time.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### Repetition Counter
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To count the repetitions, the algorithm monitors the probability of a target
 | 
					 | 
				
			||||||
pose class. Let's take push-ups with its "up" and "down" terminal states:
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
*   When the probability of the "down" pose class passes a certain threshold for
 | 
					 | 
				
			||||||
    the first time, the algorithm marks that the "down" pose class is entered.
 | 
					 | 
				
			||||||
*   Once the probability drops below the threshold, the algorithm marks that the
 | 
					 | 
				
			||||||
    "down" pose class has been exited and increases the counter.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To avoid cases when the probability fluctuates around the threshold (e.g., when
 | 
					 | 
				
			||||||
the user pauses between "up" and "down" states) causing phantom counts, the
 | 
					 | 
				
			||||||
threshold used to detect when the state is exited is actually slightly lower
 | 
					 | 
				
			||||||
than the one used to detect when the state is entered. It creates an interval
 | 
					 | 
				
			||||||
where the pose class and the counter can't be changed.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### Future Work
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
 | 
					 | 
				
			||||||
allow us to use joint angles in the feature vectors, which are more natural and
 | 
					 | 
				
			||||||
easier to configure (although distances can still be useful to detect touches
 | 
					 | 
				
			||||||
between body parts) and to perform rotation normalization of poses and reduce
 | 
					 | 
				
			||||||
the number of camera angles required for accurate k-NN classification.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Resources
 | 
					## Resources
 | 
				
			||||||
 | 
					
 | 
				
			||||||
*   Google AI Blog:
 | 
					*   Google AI Blog:
 | 
				
			||||||
| 
						 | 
					@ -512,5 +398,3 @@ the number of camera angles required for accurate k-NN classification.
 | 
				
			||||||
*   [Models and model cards](./models.md#pose)
 | 
					*   [Models and model cards](./models.md#pose)
 | 
				
			||||||
*   [Web demo](https://code.mediapipe.dev/codepen/pose)
 | 
					*   [Web demo](https://code.mediapipe.dev/codepen/pose)
 | 
				
			||||||
*   [Python Colab](https://mediapipe.page.link/pose_py_colab)
 | 
					*   [Python Colab](https://mediapipe.page.link/pose_py_colab)
 | 
				
			||||||
*   [Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic)
 | 
					 | 
				
			||||||
*   [Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended)
 | 
					 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
							
								
								
									
										142
									
								
								docs/solutions/pose_classification.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										142
									
								
								docs/solutions/pose_classification.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,142 @@
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					layout: default
 | 
				
			||||||
 | 
					title: Pose Classification
 | 
				
			||||||
 | 
					parent: Pose
 | 
				
			||||||
 | 
					grand_parent: Solutions
 | 
				
			||||||
 | 
					nav_order: 1
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Pose Classification
 | 
				
			||||||
 | 
					{: .no_toc }
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<details close markdown="block">
 | 
				
			||||||
 | 
					  <summary>
 | 
				
			||||||
 | 
					    Table of contents
 | 
				
			||||||
 | 
					  </summary>
 | 
				
			||||||
 | 
					  {: .text-delta }
 | 
				
			||||||
 | 
					1. TOC
 | 
				
			||||||
 | 
					{:toc}
 | 
				
			||||||
 | 
					</details>
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Overview
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					One of the applications
 | 
				
			||||||
 | 
					[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
 | 
				
			||||||
 | 
					can enable is fitness. More specifically - pose classification and repetition
 | 
				
			||||||
 | 
					counting. In this section we'll provide basic guidance on building a custom pose
 | 
				
			||||||
 | 
					classifier with the help of [Colabs](#colabs) and wrap it in a simple
 | 
				
			||||||
 | 
					[fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
 | 
				
			||||||
 | 
					powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
 | 
				
			||||||
 | 
					are used for demonstration purposes as the most common exercises.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					 |
 | 
				
			||||||
 | 
					:--------------------------------------------------------------------------------------------------------: |
 | 
				
			||||||
 | 
					*Fig 1. Pose classification and repetition counting with MediaPipe Pose.*                                  |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We picked the
 | 
				
			||||||
 | 
					[k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
 | 
				
			||||||
 | 
					(k-NN) as the classifier. It's simple and easy to start with. The algorithm
 | 
				
			||||||
 | 
					determines the object's class based on the closest samples in the training set.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					**To build it, one needs to:**
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1.  Collect image samples of the target exercises and run pose prediction on
 | 
				
			||||||
 | 
					    them,
 | 
				
			||||||
 | 
					2.  Convert obtained pose landmarks to a representation suitable for the k-NN
 | 
				
			||||||
 | 
					    classifier and form a training set using these [Colabs](#colabs),
 | 
				
			||||||
 | 
					3.  Perform the classification itself followed by repetition counting (e.g., in
 | 
				
			||||||
 | 
					    the
 | 
				
			||||||
 | 
					    [ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Training Set
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To build a good classifier appropriate samples should be collected for the
 | 
				
			||||||
 | 
					training set: about a few hundred samples for each terminal state of each
 | 
				
			||||||
 | 
					exercise (e.g., "up" and "down" positions for push-ups). It's important that
 | 
				
			||||||
 | 
					collected samples cover different camera angles, environment conditions, body
 | 
				
			||||||
 | 
					shapes, and exercise variations.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					 |
 | 
				
			||||||
 | 
					:--------------------------------------------------------------------------------------------------------------------------: |
 | 
				
			||||||
 | 
					*Fig 2. Two terminal states of push-ups.*                                                                                    |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To transform samples into a k-NN classifier training set, both
 | 
				
			||||||
 | 
					[`Pose Classification Colab (Basic)`] and
 | 
				
			||||||
 | 
					[`Pose Classification Colab (Extended)`] could be used. They use the
 | 
				
			||||||
 | 
					[Python Solution API](./pose.md#python-solution-api) to run the BlazePose models
 | 
				
			||||||
 | 
					on given images and dump predicted pose landmarks to a CSV file. Additionally,
 | 
				
			||||||
 | 
					the [`Pose Classification Colab (Extended)`] provides useful tools to find
 | 
				
			||||||
 | 
					outliers (e.g., wrongly predicted poses) and underrepresented classes (e.g., not
 | 
				
			||||||
 | 
					covering all camera angles) by classifying each sample against the entire
 | 
				
			||||||
 | 
					training set. After that, you'll be able to test the classifier on an arbitrary
 | 
				
			||||||
 | 
					video right in the Colab.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Classification
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Code of the classifier is available both in the
 | 
				
			||||||
 | 
					[`Pose Classification Colab (Extended)`] and in the
 | 
				
			||||||
 | 
					[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
 | 
				
			||||||
 | 
					Please refer to them for details of the approach described below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The k-NN algorithm used for pose classification requires a feature vector
 | 
				
			||||||
 | 
					representation of each sample and a metric to compute the distance between two
 | 
				
			||||||
 | 
					such vectors to find the nearest pose samples to a target one.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To convert pose landmarks to a feature vector, we use pairwise distances between
 | 
				
			||||||
 | 
					predefined lists of pose joints, such as distances between wrist and shoulder,
 | 
				
			||||||
 | 
					ankle and hip, and two wrists. Since the algorithm relies on distances, all
 | 
				
			||||||
 | 
					poses are normalized to have the same torso size and vertical torso orientation
 | 
				
			||||||
 | 
					before the conversion.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					 |
 | 
				
			||||||
 | 
					:--------------------------------------------------------------------------------------------------------: |
 | 
				
			||||||
 | 
					*Fig 3. Main pairwise distances used for the pose feature vector.*                                         |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To get a better classification result, k-NN search is invoked twice with
 | 
				
			||||||
 | 
					different distance metrics:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					*   First, to filter out samples that are almost the same as the target one but
 | 
				
			||||||
 | 
					    have only a few different values in the feature vector (which means
 | 
				
			||||||
 | 
					    differently bent joints and thus other pose class), minimum per-coordinate
 | 
				
			||||||
 | 
					    distance is used as distance metric,
 | 
				
			||||||
 | 
					*   Then average per-coordinate distance is used to find the nearest pose
 | 
				
			||||||
 | 
					    cluster among those from the first search.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Finally, we apply
 | 
				
			||||||
 | 
					[exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
 | 
				
			||||||
 | 
					(EMA) smoothing to level any noise from pose prediction or classification. To do
 | 
				
			||||||
 | 
					that, we search not only for the nearest pose cluster, but we calculate a
 | 
				
			||||||
 | 
					probability for each of them and use it for smoothing over time.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Repetition Counting
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To count the repetitions, the algorithm monitors the probability of a target
 | 
				
			||||||
 | 
					pose class. Let's take push-ups with its "up" and "down" terminal states:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					*   When the probability of the "down" pose class passes a certain threshold for
 | 
				
			||||||
 | 
					    the first time, the algorithm marks that the "down" pose class is entered.
 | 
				
			||||||
 | 
					*   Once the probability drops below the threshold, the algorithm marks that the
 | 
				
			||||||
 | 
					    "down" pose class has been exited and increases the counter.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To avoid cases when the probability fluctuates around the threshold (e.g., when
 | 
				
			||||||
 | 
					the user pauses between "up" and "down" states) causing phantom counts, the
 | 
				
			||||||
 | 
					threshold used to detect when the state is exited is actually slightly lower
 | 
				
			||||||
 | 
					than the one used to detect when the state is entered. It creates an interval
 | 
				
			||||||
 | 
					where the pose class and the counter can't be changed.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Future Work
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
 | 
				
			||||||
 | 
					allow us to use joint angles in the feature vectors, which are more natural and
 | 
				
			||||||
 | 
					easier to configure (although distances can still be useful to detect touches
 | 
				
			||||||
 | 
					between body parts) and to perform rotation normalization of poses and reduce
 | 
				
			||||||
 | 
					the number of camera angles required for accurate k-NN classification.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Colabs
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					*   [`Pose Classification Colab (Basic)`]
 | 
				
			||||||
 | 
					*   [`Pose Classification Colab (Extended)`]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					[`Pose Classification Colab (Basic)`]: https://mediapipe.page.link/pose_classification_basic
 | 
				
			||||||
 | 
					[`Pose Classification Colab (Extended)`]: https://mediapipe.page.link/pose_classification_extended
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user